From patchwork Thu Apr 16 21:46:11 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Omar Sandoval X-Patchwork-Id: 11493793 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5EA2C14DD for ; Thu, 16 Apr 2020 21:46:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 45835221F4 for ; Thu, 16 Apr 2020 21:46:36 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=osandov-com.20150623.gappssmtp.com header.i=@osandov-com.20150623.gappssmtp.com header.b="Y6OM9PRd" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728431AbgDPVqf (ORCPT ); Thu, 16 Apr 2020 17:46:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48862 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1726456AbgDPVqe (ORCPT ); Thu, 16 Apr 2020 17:46:34 -0400 Received: from mail-pg1-x544.google.com (mail-pg1-x544.google.com [IPv6:2607:f8b0:4864:20::544]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DB6E6C061A0C for ; Thu, 16 Apr 2020 14:46:34 -0700 (PDT) Received: by mail-pg1-x544.google.com with SMTP id e12so36009pgj.6 for ; Thu, 16 Apr 2020 14:46:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=osandov-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=/itYSMNpKstDNGP+csfYL+XP5hOJoVD3pIaqUC5WKo4=; b=Y6OM9PRdA9dwgKfvgZKouWhzVg0FPu1NJi22pb0yPQF6sXDxxSl/oSVdfzMDyWV3e4 tIceL6hB1DvPMpSZEamJfSzZtvApelt1oyCQIvZk3/K0eJlQr5cNkiV4TnaHI/JzYLu9 Yqh6NnZsAiHBF7sTtcZZ6QyrgU0hJ8e/W+kB2tw/FcFcCdZt42L7cCVRV538S4AO7YYA KWOAvDvzv0foOqZUPCGDkTvLq2BLHYIFvC6LPiNEI1adt/sWG9btPPEF7jRvIURXkfdx EB8UNOau/+VzEIE5aCMeVq8MCezk0o2dgu4HcNZuoQRtE2JYmcuL4rTWEXleqESYtm9r rvIA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=/itYSMNpKstDNGP+csfYL+XP5hOJoVD3pIaqUC5WKo4=; b=TZtqPYtqjg8xL6jIoT1bJauMYyd37zCGO7JMOyqhcWJwqdEdUxcP+g1Nc0c6LwtdEM ACFN0G2p+VtrXyedwzwcLp1AV2CJPiBxOdIj1pr2VYgTRhFnhCVdroEj6Aa3qkJatFTn Hnj3GGnUvgfF2CLCjnmxBbmam7GmiZLLoNiLEMtFywDwXV9aC72W3TlkWKrmOBxXdCM4 ZqUHpzfTOIRu8TBQLuCvz9Jrny+PGP/Z30e+Gq+9z86t0VG4bkS4+eVqE4DejpgNLFKn Ut3LX25YUsO0EIKVuTOsiwyh50L6raxuQS7NBz0qs6GaZkDdoxggz7xeLhnHjdzs4rqn t7Fw== X-Gm-Message-State: AGi0PubjoNlLip1ozatp5VyY6fGZVtiKyOX9KXfWktVaSkfjpeguKdmM MbfK6eYcI9a6/qZOtCyzZYPlvI6sxw8= X-Google-Smtp-Source: APiQypKxYZaKQyJEItnB9G6oWCOzAopUvXvfutHe8S57apaWaZVCFvbxTPDqmp8ErI+LfXNOcd/NkQ== X-Received: by 2002:a63:d46:: with SMTP id 6mr8501589pgn.434.1587073594084; Thu, 16 Apr 2020 14:46:34 -0700 (PDT) Received: from vader.tfbnw.net ([2620:10d:c090:400::5:844e]) by smtp.gmail.com with ESMTPSA id 17sm12440228pgg.76.2020.04.16.14.46.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 16 Apr 2020 14:46:33 -0700 (PDT) From: Omar Sandoval To: linux-btrfs@vger.kernel.org Cc: kernel-team@fb.com, Jens Axboe , Christoph Hellwig Subject: [PATCH v2 01/15] block: add bio_for_each_bvec_all() Date: Thu, 16 Apr 2020 14:46:11 -0700 Message-Id: X-Mailer: git-send-email 2.26.1 In-Reply-To: References: MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Omar Sandoval An upcoming Btrfs fix needs to know the original size of a non-cloned bios. Rather than accessing the bvec table directly, let's add a bio_for_each_bvec_all() accessor. Signed-off-by: Omar Sandoval Reviewed-by: Johannes Thumshirn --- .clang-format | 1 + Documentation/block/biovecs.rst | 2 ++ include/linux/bio.h | 8 ++++++++ 3 files changed, 11 insertions(+) diff --git a/.clang-format b/.clang-format index 6ec5558b516b..1d6e39ad2454 100644 --- a/.clang-format +++ b/.clang-format @@ -80,6 +80,7 @@ ForEachMacros: - 'ax25_uid_for_each' - '__bio_for_each_bvec' - 'bio_for_each_bvec' + - 'bio_for_each_bvec_all' - 'bio_for_each_integrity_vec' - '__bio_for_each_segment' - 'bio_for_each_segment' diff --git a/Documentation/block/biovecs.rst b/Documentation/block/biovecs.rst index ad303a2569d3..36771a131b56 100644 --- a/Documentation/block/biovecs.rst +++ b/Documentation/block/biovecs.rst @@ -129,6 +129,7 @@ Usage of helpers: :: bio_for_each_segment_all() + bio_for_each_bvec_all() bio_first_bvec_all() bio_first_page_all() bio_last_bvec_all() @@ -143,4 +144,5 @@ Usage of helpers: bio_vec' will contain a multi-page IO vector during the iteration:: bio_for_each_bvec() + bio_for_each_bvec_all() rq_for_each_bvec() diff --git a/include/linux/bio.h b/include/linux/bio.h index c1c0f9ea4e63..c506b26f273f 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -169,6 +169,14 @@ static inline void bio_advance_iter(struct bio *bio, struct bvec_iter *iter, #define bio_for_each_bvec(bvl, bio, iter) \ __bio_for_each_bvec(bvl, bio, iter, (bio)->bi_iter) +/* + * Iterate over all multi-page bvecs. Drivers shouldn't use this version for the + * same reasons as bio_for_each_segment_all(). + */ +#define bio_for_each_bvec_all(bvl, bio, i) \ + for (i = 0, bvl = bio_first_bvec_all(bio); \ + i < (bio)->bi_vcnt; i++, bvl++) \ + #define bio_iter_last(bvec, iter) ((iter).bi_size == (bvec).bv_len) static inline unsigned bio_segments(struct bio *bio) From patchwork Thu Apr 16 21:46:12 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Omar Sandoval X-Patchwork-Id: 11493797 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D88926CA for ; Thu, 16 Apr 2020 21:46:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C2217221F7 for ; Thu, 16 Apr 2020 21:46:39 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=osandov-com.20150623.gappssmtp.com header.i=@osandov-com.20150623.gappssmtp.com header.b="ualXZmN2" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728441AbgDPVqi (ORCPT ); Thu, 16 Apr 2020 17:46:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48866 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1726456AbgDPVqh (ORCPT ); Thu, 16 Apr 2020 17:46:37 -0400 Received: from mail-pl1-x643.google.com (mail-pl1-x643.google.com [IPv6:2607:f8b0:4864:20::643]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3DDF6C061A0C for ; Thu, 16 Apr 2020 14:46:36 -0700 (PDT) Received: by mail-pl1-x643.google.com with SMTP id k18so125426pll.6 for ; Thu, 16 Apr 2020 14:46:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=osandov-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=q/mzJgos1XgRNub/iLArtVT11nSXAVRo/sPPRfNFoYA=; b=ualXZmN2Hcfi4j39kglPHWtAfyGoA0sVV87C8mT3ys2zFkNbTnXyUQRkcjpTbRS53H becuxBTtd5k17XvGEf6cHmJyY2GL6v0fI66JDOn97bQijslyrNFJ7HFNODvTmd3yFX5o RONXgwtc9g/t8p+hx6G7OG4XfGx3fQkJ4S/l4X5rgcYZMrSczj7vxy6r2JoIFUlG8R52 IXQlVpQXiNGtu7D73zYfFxKxslZN2fr5FG8SHYQRaS9u5DOPutD2Ah9OyS2RVUsPQxdL ODhDw5cxOoWuWVLHEjjI+z112WJKgdiMaSe+zQDH2QMYFgy8NwXzTN2MYkqvOgWFWxjy hIEg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=q/mzJgos1XgRNub/iLArtVT11nSXAVRo/sPPRfNFoYA=; b=Y6tzMzXBd/PyF6MOJIhBXzoIKLmdyGASpyaKwMZXhT+QtFK0tcKlEssLa/jbpaVQ9p TlpsL5ary1AZVJLev4ftyJ9Vx7MzstYliaCvWcR392M90qaD3qYHe9RA9fkrRMN/Ntxo XvJH1EbKrkfUCNADGm3VtW99cB40QYpt04XzClB882VYqmbdCI/46APsGe8h1qedQU15 NKWrTnr8jdqbz/k8ectPfrMuNVdpxOXIMH8S6tnE8aTKhZs29+oSNnSdyO15KgC1Gjrf LzO6LEyK2sa0RnP6PLXyVYTdwWex5kN2cWRK13PdaQrFHFhdZ6kCC8sZjRgKULRuIEgh ZzdA== X-Gm-Message-State: AGi0PuaTKNzpnSydLCRezZ6LFQ9gXf3WXfhdpKBAj/vPpF0IpecNiBjd YzQjhNluk39HPi39t8Q65xapHp8xB1M= X-Google-Smtp-Source: APiQypIglHbhRYwjv7Mtv4fDNsAxRWKr/WNnEOADkmjHUQ/lSGM4/zKRQCLumImlxPwNL/yk+n/TVA== X-Received: by 2002:a17:90a:1f4b:: with SMTP id y11mr473243pjy.136.1587073595444; Thu, 16 Apr 2020 14:46:35 -0700 (PDT) Received: from vader.tfbnw.net ([2620:10d:c090:400::5:844e]) by smtp.gmail.com with ESMTPSA id 17sm12440228pgg.76.2020.04.16.14.46.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 16 Apr 2020 14:46:34 -0700 (PDT) From: Omar Sandoval To: linux-btrfs@vger.kernel.org Cc: kernel-team@fb.com, Jens Axboe , Christoph Hellwig Subject: [PATCH v2 02/15] btrfs: fix error handling when submitting direct I/O bio Date: Thu, 16 Apr 2020 14:46:12 -0700 Message-Id: <6953c2afd9ba307568ca03604c76a767320d56ca.1587072977.git.osandov@fb.com> X-Mailer: git-send-email 2.26.1 In-Reply-To: References: MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Omar Sandoval In btrfs_submit_direct_hook(), if a direct I/O write doesn't span a RAID stripe or chunk, we submit orig_bio without cloning it. In this case, we don't increment pending_bios. Then, if btrfs_submit_dio_bio() fails, we decrement pending_bios to -1, and we never complete orig_bio. Fix it by initializing pending_bios to 1 instead of incrementing later. Fixing this exposes another bug: we put orig_bio prematurely and then put it again from end_io. Fix it by not putting orig_bio. After this change, pending_bios is really more of a reference count, but I'll leave that cleanup separate to keep the fix small. Fixes: e65e15355429 ("btrfs: fix panic caused by direct IO") Reviewed-by: Nikolay Borisov Reviewed-by: Josef Bacik Signed-off-by: Omar Sandoval Reviewed-by: Johannes Thumshirn --- fs/btrfs/inode.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 259239b33370..b628c319a5b6 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -7939,7 +7939,6 @@ static int btrfs_submit_direct_hook(struct btrfs_dio_private *dip) /* bio split */ ASSERT(geom.len <= INT_MAX); - atomic_inc(&dip->pending_bios); do { clone_len = min_t(int, submit_len, geom.len); @@ -7989,7 +7988,8 @@ static int btrfs_submit_direct_hook(struct btrfs_dio_private *dip) if (!status) return 0; - bio_put(bio); + if (bio != orig_bio) + bio_put(bio); out_err: dip->errors = 1; /* @@ -8030,7 +8030,7 @@ static void btrfs_submit_direct(struct bio *dio_bio, struct inode *inode, bio->bi_private = dip; dip->orig_bio = bio; dip->dio_bio = dio_bio; - atomic_set(&dip->pending_bios, 0); + atomic_set(&dip->pending_bios, 1); io_bio = btrfs_io_bio(bio); io_bio->logical = file_offset; From patchwork Thu Apr 16 21:46:13 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Omar Sandoval X-Patchwork-Id: 11493795 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4DECE14DD for ; Thu, 16 Apr 2020 21:46:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 304FD221F9 for ; Thu, 16 Apr 2020 21:46:39 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=osandov-com.20150623.gappssmtp.com header.i=@osandov-com.20150623.gappssmtp.com header.b="uQK186AS" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728443AbgDPVqi (ORCPT ); Thu, 16 Apr 2020 17:46:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48874 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1728432AbgDPVqh (ORCPT ); Thu, 16 Apr 2020 17:46:37 -0400 Received: from mail-pj1-x1044.google.com (mail-pj1-x1044.google.com [IPv6:2607:f8b0:4864:20::1044]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D5F14C061A0F for ; Thu, 16 Apr 2020 14:46:37 -0700 (PDT) Received: by mail-pj1-x1044.google.com with SMTP id ng8so134138pjb.2 for ; Thu, 16 Apr 2020 14:46:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=osandov-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=79OcIzy+6HmtVaphSQyJYy+wHlPhCxgL24bRSUnXELA=; b=uQK186ASynPJVw5oZ0lTcr/ni4fFWJowDoY881ZtDevq4A7HMcmjPsN4AGrQfvqz9N +qqzH7CmxLpQ9ifyB1LaHNJpR+eutLjSvy8Rsd4v/go6JARh2LwzBMBIe1cyIG1EdYUL 4+kT/96JmiYPkwfkPraxjj+YDotI8GH4me+yClt4b6mVgQ+l5qnmPbdoE3B0ZrQGoFA4 pD0uNfqu8eX4hlkRG3bC0K/KyT3/KlGXqWFb3+Nb5PfWBOP6KJMgqdCR7LHmvtR46Mt2 SRjFj4+P22zrBH0jrylOBljp+2nvUWDb2St7BZR8WVk1kjeDc4TUGqyvHEW5+ps8dnW5 vnjQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=79OcIzy+6HmtVaphSQyJYy+wHlPhCxgL24bRSUnXELA=; b=k6mMKm76eOw8bis2XgTl7btUqWt/Qq7Ql/AetoMQazE/7ad5uejUswBfdLiVwdOHkE FWRNUfnmQ7zlS306zQ4iBDODop24fY1lXUzLu5vLRQIqegV2cxEhZ5WK6wGBHt4caOwm DCUgKT91WLz0ZyKCclu/DudJ0eceT3T7jl77NHF1K7qkXlRPJz+e0K7DWNoPVQWoleYw da7VRa+3CTIm5qJHJ1kavqo/P905H4y5nZCk85htgkvb4APt1a+BvDJ1XEVZxWmdVrn4 x79yTTOhx4mi37133sfG2VOkUFVKcRfhWgFeB+iPkj05OtbIypqu1qkRfqFKNe5gWneJ kaBg== X-Gm-Message-State: AGi0PuaxT+lPbsRpolLXFUxvAGbQYbt0z7RfIaQsuNaMyjCWrOhvblul 5bb2xW69loF/yYax5uVNgZQHebl0nS0= X-Google-Smtp-Source: APiQypLkMsXR3cj3lu5+np8hu2Zqy1QjOz7aqEmFLMXTm63/EeYr99FsmsB5R6U40JB6YekK+kpHow== X-Received: by 2002:a17:902:b783:: with SMTP id e3mr267722pls.217.1587073596935; Thu, 16 Apr 2020 14:46:36 -0700 (PDT) Received: from vader.tfbnw.net ([2620:10d:c090:400::5:844e]) by smtp.gmail.com with ESMTPSA id 17sm12440228pgg.76.2020.04.16.14.46.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 16 Apr 2020 14:46:36 -0700 (PDT) From: Omar Sandoval To: linux-btrfs@vger.kernel.org Cc: kernel-team@fb.com, Jens Axboe , Christoph Hellwig Subject: [PATCH v2 03/15] btrfs: fix double __endio_write_update_ordered in direct I/O Date: Thu, 16 Apr 2020 14:46:13 -0700 Message-Id: <594c8cb6dd64cebdf5e01016ce823e1be00fc7ab.1587072977.git.osandov@fb.com> X-Mailer: git-send-email 2.26.1 In-Reply-To: References: MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Omar Sandoval In btrfs_submit_direct(), if we fail to allocate the btrfs_dio_private, we complete the ordered extent range. However, we don't mark that the range doesn't need to be cleaned up from btrfs_direct_IO() until later. Therefore, if we fail to allocate the btrfs_dio_private, we complete the ordered extent range twice. We could fix this by updating unsubmitted_oe_range earlier, but it's cleaner to reorganize the code so that creating the btrfs_dio_private and submitting the bios are separate, and once the btrfs_dio_private is created, cleanup always happens through the btrfs_dio_private. Fixes: f28a49287817 ("Btrfs: fix leaking of ordered extents after direct IO write error") Signed-off-by: Omar Sandoval Reviewed-by: Johannes Thumshirn Reviewed-by: Nikolay Borisov , however please see --- fs/btrfs/inode.c | 174 ++++++++++++++++++----------------------------- 1 file changed, 66 insertions(+), 108 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index b628c319a5b6..f6ce9749adb6 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -7903,14 +7903,60 @@ static inline blk_status_t btrfs_submit_dio_bio(struct bio *bio, return ret; } -static int btrfs_submit_direct_hook(struct btrfs_dio_private *dip) +/* + * If this succeeds, the btrfs_dio_private is responsible for cleaning up locked + * or ordered extents whether or not we submit any bios. + */ +static struct btrfs_dio_private *btrfs_create_dio_private(struct bio *dio_bio, + struct inode *inode, + loff_t file_offset) { - struct inode *inode = dip->inode; + const bool write = (bio_op(dio_bio) == REQ_OP_WRITE); + struct btrfs_dio_private *dip; + struct bio *bio; + + dip = kzalloc(sizeof(*dip), GFP_NOFS); + if (!dip) + return NULL; + + bio = btrfs_bio_clone(dio_bio); + bio->bi_private = dip; + btrfs_io_bio(bio)->logical = file_offset; + + dip->private = dio_bio->bi_private; + dip->inode = inode; + dip->logical_offset = file_offset; + dip->bytes = dio_bio->bi_iter.bi_size; + dip->disk_bytenr = (u64)dio_bio->bi_iter.bi_sector << 9; + dip->orig_bio = bio; + dip->dio_bio = dio_bio; + atomic_set(&dip->pending_bios, 1); + + if (write) { + struct btrfs_dio_data *dio_data = current->journal_info; + + dio_data->unsubmitted_oe_range_end = dip->logical_offset + + dip->bytes; + dio_data->unsubmitted_oe_range_start = + dio_data->unsubmitted_oe_range_end; + + bio->bi_end_io = btrfs_endio_direct_write; + } else { + bio->bi_end_io = btrfs_endio_direct_read; + dip->subio_endio = btrfs_subio_endio_read; + } + return dip; +} + +static void btrfs_submit_direct(struct bio *dio_bio, struct inode *inode, + loff_t file_offset) +{ + const bool write = (bio_op(dio_bio) == REQ_OP_WRITE); struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); + struct btrfs_dio_private *dip; struct bio *bio; - struct bio *orig_bio = dip->orig_bio; - u64 start_sector = orig_bio->bi_iter.bi_sector; - u64 file_offset = dip->logical_offset; + struct bio *orig_bio; + u64 start_sector; int async_submit = 0; u64 submit_len; int clone_offset = 0; @@ -7919,11 +7965,24 @@ static int btrfs_submit_direct_hook(struct btrfs_dio_private *dip) blk_status_t status; struct btrfs_io_geometry geom; + dip = btrfs_create_dio_private(dio_bio, inode, file_offset); + if (!dip) { + if (!write) { + unlock_extent(&BTRFS_I(inode)->io_tree, file_offset, + file_offset + dio_bio->bi_iter.bi_size - 1); + } + dio_bio->bi_status = BLK_STS_RESOURCE; + dio_end_io(dio_bio); + return; + } + + orig_bio = dip->orig_bio; + start_sector = orig_bio->bi_iter.bi_sector; submit_len = orig_bio->bi_iter.bi_size; ret = btrfs_get_io_geometry(fs_info, btrfs_op(orig_bio), start_sector << 9, submit_len, &geom); if (ret) - return -EIO; + goto out_err; if (geom.len >= submit_len) { bio = orig_bio; @@ -7986,7 +8045,7 @@ static int btrfs_submit_direct_hook(struct btrfs_dio_private *dip) submit: status = btrfs_submit_dio_bio(bio, inode, file_offset, async_submit); if (!status) - return 0; + return; if (bio != orig_bio) bio_put(bio); @@ -8000,107 +8059,6 @@ static int btrfs_submit_direct_hook(struct btrfs_dio_private *dip) */ if (atomic_dec_and_test(&dip->pending_bios)) bio_io_error(dip->orig_bio); - - /* bio_end_io() will handle error, so we needn't return it */ - return 0; -} - -static void btrfs_submit_direct(struct bio *dio_bio, struct inode *inode, - loff_t file_offset) -{ - struct btrfs_dio_private *dip = NULL; - struct bio *bio = NULL; - struct btrfs_io_bio *io_bio; - bool write = (bio_op(dio_bio) == REQ_OP_WRITE); - int ret = 0; - - bio = btrfs_bio_clone(dio_bio); - - dip = kzalloc(sizeof(*dip), GFP_NOFS); - if (!dip) { - ret = -ENOMEM; - goto free_ordered; - } - - dip->private = dio_bio->bi_private; - dip->inode = inode; - dip->logical_offset = file_offset; - dip->bytes = dio_bio->bi_iter.bi_size; - dip->disk_bytenr = (u64)dio_bio->bi_iter.bi_sector << 9; - bio->bi_private = dip; - dip->orig_bio = bio; - dip->dio_bio = dio_bio; - atomic_set(&dip->pending_bios, 1); - io_bio = btrfs_io_bio(bio); - io_bio->logical = file_offset; - - if (write) { - bio->bi_end_io = btrfs_endio_direct_write; - } else { - bio->bi_end_io = btrfs_endio_direct_read; - dip->subio_endio = btrfs_subio_endio_read; - } - - /* - * Reset the range for unsubmitted ordered extents (to a 0 length range) - * even if we fail to submit a bio, because in such case we do the - * corresponding error handling below and it must not be done a second - * time by btrfs_direct_IO(). - */ - if (write) { - struct btrfs_dio_data *dio_data = current->journal_info; - - dio_data->unsubmitted_oe_range_end = dip->logical_offset + - dip->bytes; - dio_data->unsubmitted_oe_range_start = - dio_data->unsubmitted_oe_range_end; - } - - ret = btrfs_submit_direct_hook(dip); - if (!ret) - return; - - btrfs_io_bio_free_csum(io_bio); - -free_ordered: - /* - * If we arrived here it means either we failed to submit the dip - * or we either failed to clone the dio_bio or failed to allocate the - * dip. If we cloned the dio_bio and allocated the dip, we can just - * call bio_endio against our io_bio so that we get proper resource - * cleanup if we fail to submit the dip, otherwise, we must do the - * same as btrfs_endio_direct_[write|read] because we can't call these - * callbacks - they require an allocated dip and a clone of dio_bio. - */ - if (bio && dip) { - bio_io_error(bio); - /* - * The end io callbacks free our dip, do the final put on bio - * and all the cleanup and final put for dio_bio (through - * dio_end_io()). - */ - dip = NULL; - bio = NULL; - } else { - if (write) - __endio_write_update_ordered(inode, - file_offset, - dio_bio->bi_iter.bi_size, - false); - else - unlock_extent(&BTRFS_I(inode)->io_tree, file_offset, - file_offset + dio_bio->bi_iter.bi_size - 1); - - dio_bio->bi_status = BLK_STS_IOERR; - /* - * Releases and cleans up our dio_bio, no need to bio_put() - * nor bio_endio()/bio_io_error() against dio_bio. - */ - dio_end_io(dio_bio); - } - if (bio) - bio_put(bio); - kfree(dip); } static ssize_t check_direct_IO(struct btrfs_fs_info *fs_info, From patchwork Thu Apr 16 21:46:14 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Omar Sandoval X-Patchwork-Id: 11493801 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 18EAB174A for ; Thu, 16 Apr 2020 21:46:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 00ECC221F7 for ; Thu, 16 Apr 2020 21:46:42 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=osandov-com.20150623.gappssmtp.com header.i=@osandov-com.20150623.gappssmtp.com header.b="Znx6AbTt" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728450AbgDPVqk (ORCPT ); Thu, 16 Apr 2020 17:46:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48878 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1728432AbgDPVqk (ORCPT ); Thu, 16 Apr 2020 17:46:40 -0400 Received: from mail-pj1-x1036.google.com (mail-pj1-x1036.google.com [IPv6:2607:f8b0:4864:20::1036]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E68B7C061A0C for ; Thu, 16 Apr 2020 14:46:38 -0700 (PDT) Received: by mail-pj1-x1036.google.com with SMTP id cl8so131085pjb.3 for ; Thu, 16 Apr 2020 14:46:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=osandov-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=7URV1UXPA4e6eDFDJgHNM1H+5oNq+PRNWisnAs9Npn0=; b=Znx6AbTtASxUQz0JnVHgwk9HslE39nBt8PoYmhS5MbR5a6qHNJP1DYk5ssyZgLhlNl brJAikPVRArH3phVJpfKj6lZa6wDAensZOS9b/LXnIiilv2ULbbhAzya2offO6bJetBe Yax29TeIinLUp5HovBGW4fzF8/N/GJXU2dh1mV+4bUd54/t3qrz8Wfjov6frc7vKHKbP +VdbJeiAfSVhLNnfJzzobSc5Mz5vU9aSgllhQ/92Xh9Fon7Skdqsi4xWeGYD6jQNbiFd lMCcTXPZmEFAGQD7ZCBr9gythgYKgOefldQpeuwMyNoL96AUonXAVs0aA1Rn/2i0+ddP UxLQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=7URV1UXPA4e6eDFDJgHNM1H+5oNq+PRNWisnAs9Npn0=; b=cUUqBAW5nlwXW+rX2/YtcC1O0EFtU3Z6H0AipbpqDa+Q5mu7hieDoM21Hqqx6odtJ6 KzbJnjaRWF+qG8Qj1PA/PGoBh+h6Xb63R4zF8y+YKRN0GzTaSDCjQ1H0NVccYaG5QMh3 18PUPj/jh2qV+1/cnGbte0w1boSZWBX3CbErzt0eqVxW+rAWY4URFsF3iGq4exhe5OfC foIELX7+hGEsCCa8MCwmxHZOFMLDIjVydHB0s2IarOtdSFJDVIf6mjO6dFmU8gJ/RzD1 25Qvz/Z0CWbi2F0OKZfhJbBNyeMSXmhxApuiMhmzDe2HdC7VmDUnm8M2zFNFyWFpaAF4 qsZA== X-Gm-Message-State: AGi0Pua2hpPHl3Qeb9orgQesXwdKaqHYzWj1+TdrlNWJR6aOgOGDm9Ca ucncECUvIz9dO3zKfd3oOuTYljUtgHU= X-Google-Smtp-Source: APiQypIdIwt7l3WeT7Pg+ZyJOtZolew/bpA6CwcOtFarU1PMjASX/I5D18TzYXzcQXXSZdwAHUQJpQ== X-Received: by 2002:a17:90a:6fe4:: with SMTP id e91mr500132pjk.28.1587073598097; Thu, 16 Apr 2020 14:46:38 -0700 (PDT) Received: from vader.tfbnw.net ([2620:10d:c090:400::5:844e]) by smtp.gmail.com with ESMTPSA id 17sm12440228pgg.76.2020.04.16.14.46.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 16 Apr 2020 14:46:37 -0700 (PDT) From: Omar Sandoval To: linux-btrfs@vger.kernel.org Cc: kernel-team@fb.com, Jens Axboe , Christoph Hellwig Subject: [PATCH v2 04/15] btrfs: look at full bi_io_vec for repair decision Date: Thu, 16 Apr 2020 14:46:14 -0700 Message-Id: X-Mailer: git-send-email 2.26.1 In-Reply-To: References: MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Omar Sandoval Read repair does two things: it finds a good copy of data to return to the reader, and it corrects the bad copy on disk. If a read of multiple sectors has an I/O error, repair does an extra "validation" step that issues a separate read for each sector. This allows us to find the exact failing sectors and only rewrite those. This heuristic is implemented in bio_readpage_error()/btrfs_check_repairable() as: failed_bio_pages = failed_bio->bi_iter.bi_size >> PAGE_SHIFT; if (failed_bio_pages > 1) do validation However, at this point, bi_iter may have already been advanced. This means that we'll skip the validation step and rewrite the entire failed read. Fix it by getting the actual size from the biovec (which we can do because this is only called for non-cloned bios, although that will change in a later commit). Fixes: 8a2ee44a371c ("btrfs: look at bi_size for repair decisions") Signed-off-by: Omar Sandoval Reviewed-by: Johannes Thumshirn --- fs/btrfs/extent_io.c | 33 +++++++++++++++++++++++++++------ fs/btrfs/extent_io.h | 5 +++-- 2 files changed, 30 insertions(+), 8 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 39e45b8a5031..712f49607d3a 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2537,8 +2537,9 @@ int btrfs_get_io_failure_record(struct inode *inode, u64 start, u64 end, return 0; } -bool btrfs_check_repairable(struct inode *inode, unsigned failed_bio_pages, - struct io_failure_record *failrec, int failed_mirror) +bool btrfs_check_repairable(struct inode *inode, bool need_validation, + struct io_failure_record *failrec, + int failed_mirror) { struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); int num_copies; @@ -2561,7 +2562,7 @@ bool btrfs_check_repairable(struct inode *inode, unsigned failed_bio_pages, * a) deliver good data to the caller * b) correct the bad sectors on disk */ - if (failed_bio_pages > 1) { + if (need_validation) { /* * to fulfill b), we need to know the exact failing sectors, as * we don't want to rewrite any more than the failed ones. thus, @@ -2633,6 +2634,24 @@ struct bio *btrfs_create_repair_bio(struct inode *inode, struct bio *failed_bio, return bio; } +static bool btrfs_io_needs_validation(struct inode *inode, struct bio *bio) +{ + struct bio_vec *bvec; + u64 len = 0; + int i; + + /* + * We need to validate each sector individually if the failed I/O was + * for multiple sectors. + */ + bio_for_each_bvec_all(bvec, bio, i) { + len += bvec->bv_len; + if (len > inode->i_sb->s_blocksize) + return true; + } + return false; +} + /* * This is a generic handler for readpage errors. If other copies exist, read * those and write back good data to the failed position. Does not investigate @@ -2647,11 +2666,11 @@ static int bio_readpage_error(struct bio *failed_bio, u64 phy_offset, struct inode *inode = page->mapping->host; struct extent_io_tree *tree = &BTRFS_I(inode)->io_tree; struct extent_io_tree *failure_tree = &BTRFS_I(inode)->io_failure_tree; + bool need_validation; struct bio *bio; int read_mode = 0; blk_status_t status; int ret; - unsigned failed_bio_pages = failed_bio->bi_iter.bi_size >> PAGE_SHIFT; BUG_ON(bio_op(failed_bio) == REQ_OP_WRITE); @@ -2659,13 +2678,15 @@ static int bio_readpage_error(struct bio *failed_bio, u64 phy_offset, if (ret) return ret; - if (!btrfs_check_repairable(inode, failed_bio_pages, failrec, + need_validation = btrfs_io_needs_validation(inode, failed_bio); + + if (!btrfs_check_repairable(inode, need_validation, failrec, failed_mirror)) { free_io_failure(failure_tree, tree, failrec); return -EIO; } - if (failed_bio_pages > 1) + if (need_validation) read_mode |= REQ_FAILFAST_DEV; phy_offset >>= inode->i_sb->s_blocksize_bits; diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index 2ed65bd0760e..26c0fce0bb64 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -312,8 +312,9 @@ struct io_failure_record { }; -bool btrfs_check_repairable(struct inode *inode, unsigned failed_bio_pages, - struct io_failure_record *failrec, int fail_mirror); +bool btrfs_check_repairable(struct inode *inode, bool need_validation, + struct io_failure_record *failrec, + int failed_mirror); struct bio *btrfs_create_repair_bio(struct inode *inode, struct bio *failed_bio, struct io_failure_record *failrec, struct page *page, int pg_offset, int icsum, From patchwork Thu Apr 16 21:46:15 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Omar Sandoval X-Patchwork-Id: 11493799 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B7ADB14DD for ; Thu, 16 Apr 2020 21:46:41 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A0707221F7 for ; Thu, 16 Apr 2020 21:46:41 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=osandov-com.20150623.gappssmtp.com header.i=@osandov-com.20150623.gappssmtp.com header.b="AHuZiJ/I" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728462AbgDPVqk (ORCPT ); Thu, 16 Apr 2020 17:46:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48882 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1728445AbgDPVqk (ORCPT ); Thu, 16 Apr 2020 17:46:40 -0400 Received: from mail-pg1-x542.google.com (mail-pg1-x542.google.com [IPv6:2607:f8b0:4864:20::542]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0AC40C061A0F for ; Thu, 16 Apr 2020 14:46:40 -0700 (PDT) Received: by mail-pg1-x542.google.com with SMTP id w11so21212pga.12 for ; Thu, 16 Apr 2020 14:46:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=osandov-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=KgrlD/bFbuJ+7ycGARpz9SjCx8tAPM2/16UcN5GB4v0=; b=AHuZiJ/IT0n10haLQJKannFT10VsuDGjHlGrZE2TWWxDKo/wulFag8Az7bg9Kv4bwj PuC5vbVvkBZ+JeOuqv6ecMG5wjqc2a8CeAGiYhk5LqAvHWxq8roFJKTPvLEksyTHIBUx xG2V3TpaDc4+7IPga76/VZ7pZSA7V4XuL/Ssg92TOKvvyYYinMRpLrQOK/ijs1jL4YDP NCHLGrDZAIyvmWJBQjMKK/TNruZHR1+Uy+1Vt5wzAZu1Y9j7uyj9YasC8FgUchfniacp BqiXAT1elyXsoD4WDv3VP64mUMdXmOLJqmt7sxGvoX1npzxYXLZUw9+VKVulMCRQY4U7 UXkw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=KgrlD/bFbuJ+7ycGARpz9SjCx8tAPM2/16UcN5GB4v0=; b=Awyl5N8mFjPbfSZTWTpwd1FzKYg6hho9F5+HMIGXtb1U2PUR4jhS5EFoR2RINNjOwb HMNQcTcx9F63JGQcRQNTPRHQF6hZ0FG5u4o3CGZJmA7kpibxi9JjbQi2/c8cHT6HIBtt pMHGCf2fYXDPts6lXqddCOeRg11WTx4M6alYMzF7FV2KfD8omg5vT6RTWeo21n47W00M EeHzoSllACU9nFyPR770egjcMgoZqJRuhrAuVsiQd4Ox/mGnP24inPkkFLBqrep9k2as dcd2v3skYif7Dv7Zy9U+VzvcVFdXT9/eVsLIKYtdcpRBAsdaf7yPo9gMNucOJdlTd/y5 w4Mw== X-Gm-Message-State: AGi0PubmcfS2SvMvZ7nRS+UO6WoqRMoYPg++x+c6q5FKj1T9Sz3+ZaNL x/lN50WQyFcxq/5B+R+2tpFsQvz4Yqo= X-Google-Smtp-Source: APiQypJRG+qprwH3A9ByVRVAJGg6bv4AXgnOCXiJYunLF/x33aPZl8DT3LX82nbP7z7fMFKzakAu8A== X-Received: by 2002:a62:3784:: with SMTP id e126mr14559469pfa.303.1587073599269; Thu, 16 Apr 2020 14:46:39 -0700 (PDT) Received: from vader.tfbnw.net ([2620:10d:c090:400::5:844e]) by smtp.gmail.com with ESMTPSA id 17sm12440228pgg.76.2020.04.16.14.46.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 16 Apr 2020 14:46:38 -0700 (PDT) From: Omar Sandoval To: linux-btrfs@vger.kernel.org Cc: kernel-team@fb.com, Jens Axboe , Christoph Hellwig Subject: [PATCH v2 05/15] btrfs: don't do repair validation for checksum errors Date: Thu, 16 Apr 2020 14:46:15 -0700 Message-Id: X-Mailer: git-send-email 2.26.1 In-Reply-To: References: MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Omar Sandoval The purpose of the validation step is to distinguish between good and bad sectors in a failed multi-sector read. If a multi-sector read succeeded but some of those sectors had checksum errors, we don't need to validate anything; we know the sectors with bad checksums need to be repaired. Reviewed-by: Josef Bacik Signed-off-by: Omar Sandoval Reviewed-by: Johannes Thumshirn --- fs/btrfs/extent_io.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 712f49607d3a..25dd42437cbd 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2640,6 +2640,14 @@ static bool btrfs_io_needs_validation(struct inode *inode, struct bio *bio) u64 len = 0; int i; + /* + * If bi_status is BLK_STS_OK, then this was a checksum error, not an + * I/O error. In this case, we already know exactly which sector was + * bad, so we don't need to validate. + */ + if (bio->bi_status == BLK_STS_OK) + return false; + /* * We need to validate each sector individually if the failed I/O was * for multiple sectors. From patchwork Thu Apr 16 21:46:16 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Omar Sandoval X-Patchwork-Id: 11493803 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 495EA6CA for ; Thu, 16 Apr 2020 21:46:43 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 31DDF221F4 for ; Thu, 16 Apr 2020 21:46:43 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=osandov-com.20150623.gappssmtp.com header.i=@osandov-com.20150623.gappssmtp.com header.b="CUGc/cGx" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728467AbgDPVqm (ORCPT ); Thu, 16 Apr 2020 17:46:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48888 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1728445AbgDPVql (ORCPT ); Thu, 16 Apr 2020 17:46:41 -0400 Received: from mail-pj1-x1043.google.com (mail-pj1-x1043.google.com [IPv6:2607:f8b0:4864:20::1043]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4D8EFC061A0C for ; Thu, 16 Apr 2020 14:46:41 -0700 (PDT) Received: by mail-pj1-x1043.google.com with SMTP id a22so124660pjk.5 for ; Thu, 16 Apr 2020 14:46:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=osandov-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=OvhwnRTLtN+fEx765ccxADScIuZl05CxCF78j/Vtem8=; b=CUGc/cGxQ/+am7qYJzFbTtYdejPaqwrPtvJNuVGcwGdqaQwdmjPKxJ82rFAygbY4qN HUcAbjCwhLAy+m+yi+G35NdFMCOQ/TYlYKn1VZ4hCHqV5aH8cTcKoQCWF/rN/Zj0Boq7 NaWQYuQgAn2YH3xgTr+bsQ9BFEiqxVP3l8UnHESsLo7OpaN01Akmp9wVhCOTpfyXBNHk CSZrG/X2NTD/J6/0Alfj319R5DiNEkTWwoszlB87+11/2ty/dPB/QZvTEzxFOrWcb2AM +XRibCjlPG9Ktp5mZa5S2ZPOVADaAZYHXtWxArbTcfNTVuAfD87X5xl+2FCFruEPpuMi Ptgg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=OvhwnRTLtN+fEx765ccxADScIuZl05CxCF78j/Vtem8=; b=kMlVcRWRSTeFJUU1WesffPqGpa13uODej//vYuqA6zXit/OzumEvMKZcQMvG7Km5AP WdYI/uh+Scyk8YJGfpDYfqZxkwCYBRHnQwpAR9jjAU/O6UcsfTH40h/MvKZkxaOF0JXs y5oeHQl7k09iYZd4UfnppsR769YZQRijuVvWQx+z/vw9TPlfBSOKLVFgVG83WPJeNCX0 PCVcQN9wBCrfVIUUnXzqeWAGToQyBhdYYrHhjdz0iumN3+8zhcVY0V2yPkFsXECtHRgD aQw4vWPuOo0Glnr5EEZvQwh42cf+t/mAY9rpA8eXnTQ0iWYbiOmusEV3i53LoJKZUWgl i3rA== X-Gm-Message-State: AGi0PuZLjnmT/Sunq9pbMk542CnDnubV8Y5CTWAuml2dNLmnEyf6SZQl 2JoErRT0/yx3nvu8l9HQxJ2IfRn5Mn4= X-Google-Smtp-Source: APiQypJjCkSPlNHULjmXHFLfA4KhTJZ5wwDxXi2BDPlXa5y/ZGnA/aIa+Vkx92PEWWc+Ck+NEoonkQ== X-Received: by 2002:a17:90b:3444:: with SMTP id lj4mr432651pjb.37.1587073600440; Thu, 16 Apr 2020 14:46:40 -0700 (PDT) Received: from vader.tfbnw.net ([2620:10d:c090:400::5:844e]) by smtp.gmail.com with ESMTPSA id 17sm12440228pgg.76.2020.04.16.14.46.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 16 Apr 2020 14:46:39 -0700 (PDT) From: Omar Sandoval To: linux-btrfs@vger.kernel.org Cc: kernel-team@fb.com, Jens Axboe , Christoph Hellwig Subject: [PATCH v2 06/15] btrfs: clarify btrfs_lookup_bio_sums documentation Date: Thu, 16 Apr 2020 14:46:16 -0700 Message-Id: <51f9f9b917ded1057f1d24f7bbf7546492f3ea98.1587072977.git.osandov@fb.com> X-Mailer: git-send-email 2.26.1 In-Reply-To: References: MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Omar Sandoval Fix a couple of issues in the btrfs_lookup_bio_sums documentation: * The bio doesn't need to be a btrfs_io_bio if dst was provided. Move the declaration in the code to make that clear, too. * dst must be large enough to hold nblocks * csum_size, not just csum_size. Reviewed-by: Josef Bacik Signed-off-by: Omar Sandoval Reviewed-by: Johannes Thumshirn Reviewed-by: Nikolay Borisov --- fs/btrfs/file-item.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c index b618ad5339ba..22cbb4da6d42 100644 --- a/fs/btrfs/file-item.c +++ b/fs/btrfs/file-item.c @@ -242,11 +242,13 @@ int btrfs_lookup_file_extent(struct btrfs_trans_handle *trans, /** * btrfs_lookup_bio_sums - Look up checksums for a bio. * @inode: inode that the bio is for. - * @bio: bio embedded in btrfs_io_bio. + * @bio: bio to look up. * @offset: Unless (u64)-1, look up checksums for this offset in the file. * If (u64)-1, use the page offsets from the bio instead. - * @dst: Buffer of size btrfs_super_csum_size() used to return checksum. If - * NULL, the checksum is returned in btrfs_io_bio(bio)->csum instead. + * @dst: Buffer of size nblocks * btrfs_super_csum_size() used to return + * checksum (nblocks = bio->bi_iter.bi_size / fs_info->sectorsize). If + * NULL, the checksum buffer is allocated and returned in + * btrfs_io_bio(bio)->csum instead. * * Return: BLK_STS_RESOURCE if allocating memory fails, BLK_STS_OK otherwise. */ @@ -256,7 +258,6 @@ blk_status_t btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio, struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); struct bio_vec bvec; struct bvec_iter iter; - struct btrfs_io_bio *btrfs_bio = btrfs_io_bio(bio); struct btrfs_csum_item *item = NULL; struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree; struct btrfs_path *path; @@ -277,6 +278,8 @@ blk_status_t btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio, nblocks = bio->bi_iter.bi_size >> inode->i_sb->s_blocksize_bits; if (!dst) { + struct btrfs_io_bio *btrfs_bio = btrfs_io_bio(bio); + if (nblocks * csum_size > BTRFS_BIO_INLINE_CSUM_SIZE) { btrfs_bio->csum = kmalloc_array(nblocks, csum_size, GFP_NOFS); From patchwork Thu Apr 16 21:46:17 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Omar Sandoval X-Patchwork-Id: 11493805 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 31B1E6CA for ; Thu, 16 Apr 2020 21:46:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1A52A221F7 for ; Thu, 16 Apr 2020 21:46:45 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=osandov-com.20150623.gappssmtp.com header.i=@osandov-com.20150623.gappssmtp.com header.b="BVSkejBH" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728478AbgDPVqo (ORCPT ); Thu, 16 Apr 2020 17:46:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48892 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1728471AbgDPVqn (ORCPT ); Thu, 16 Apr 2020 17:46:43 -0400 Received: from mail-pl1-x644.google.com (mail-pl1-x644.google.com [IPv6:2607:f8b0:4864:20::644]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 32C33C061A0C for ; Thu, 16 Apr 2020 14:46:42 -0700 (PDT) Received: by mail-pl1-x644.google.com with SMTP id v2so118762plp.9 for ; Thu, 16 Apr 2020 14:46:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=osandov-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=mPoL16KD4zJkxaql5M14RhRv+rZlvV4Dkvjaf7+tRhk=; b=BVSkejBHp8H9h5v7HKHxXY3+Erhk5uft256J8S/p3AttXoEgyXZaQ9QUkulzr7Iy2E o08h4gCFoGmDhhDT7dWkrbOD3cQwnzmxjWkGOqsDZFm09RROLcYSacEHphslq2m/2AyW AHJIEeBT/9nQZ0Q1ry6xGQXEkzJlIhv7+/7Yli0zdneS4D7fKN/xZT6E/dqv+pv3az6L 2rE1QE8pwAT9mS6pL+9NiK5u2rn3g39wJR9E4RvQs7pq2OQCCUcNRRsFjkKgf6aXeS/P 204ogmr1W8QVvXmHonM3Yin9wd7f7BNqiFB+p/w/mPC79XDEx8tVK6EfbXiTeZDR6ER4 ELZQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=mPoL16KD4zJkxaql5M14RhRv+rZlvV4Dkvjaf7+tRhk=; b=NqtQu5VOheSKUKhPYTVfxWDCTtgFrZAypYkeiidETilhUfONKVNQRiWOOq+M5MVkF+ +lAXuJ+7MFxmXrVGLTUhRy97KhbVmjOtPEDgYMJjWPC0rO0RIxDJnf71A2z6OyjWgVq1 Uy0PoykGrUsEqrZrY6SotKIMup9hQA5b10VnuOPoYKhgQjw5Tr656AN6tK5WBVKs4Utl S3OMkYDK6u1tLpOn23cfl2YHjsaJcmmncBtQeCQtGgQpSq6VQcbw1kC9lypNDD/emjRb v5wlUSLrv/htrU6nfG7Qbi0k1X+ZkP/Iqw7xjASoFNvjoEwUdRmVxn8vyqW7Q8aiBukr yzGA== X-Gm-Message-State: AGi0PuaKmVIAhq2yUSyF6/4C9oiGUUTpDxGEVxggx3p0krp4oJ8UOsTE x7j7X02qITHKNYeSqDoAbT6oFhi71kQ= X-Google-Smtp-Source: APiQypL5bdU7WuI5JLv99MoSiNM+6f8dAzfGusl7iv6XYmedb5fOq7mj5FwipKBkr8j/jKN+V+/7hA== X-Received: by 2002:a17:90a:a40b:: with SMTP id y11mr482941pjp.130.1587073601439; Thu, 16 Apr 2020 14:46:41 -0700 (PDT) Received: from vader.tfbnw.net ([2620:10d:c090:400::5:844e]) by smtp.gmail.com with ESMTPSA id 17sm12440228pgg.76.2020.04.16.14.46.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 16 Apr 2020 14:46:41 -0700 (PDT) From: Omar Sandoval To: linux-btrfs@vger.kernel.org Cc: kernel-team@fb.com, Jens Axboe , Christoph Hellwig Subject: [PATCH v2 07/15] btrfs: rename __readpage_endio_check to check_data_csum Date: Thu, 16 Apr 2020 14:46:17 -0700 Message-Id: <5627c63e0a8cbcb02eab1b539414f0ccc271426b.1587072977.git.osandov@fb.com> X-Mailer: git-send-email 2.26.1 In-Reply-To: References: MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Omar Sandoval __readpage_endio_check() is also used from the direct I/O read code, so give it a more descriptive name. Reviewed-by: Nikolay Borisov Reviewed-by: Josef Bacik Reviewed-by: Johannes Thumshirn Signed-off-by: Omar Sandoval --- fs/btrfs/inode.c | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index f6ce9749adb6..eb3fcdc7c337 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2726,10 +2726,9 @@ void btrfs_writepage_endio_finish_ordered(struct page *page, u64 start, btrfs_queue_work(wq, &ordered_extent->work); } -static int __readpage_endio_check(struct inode *inode, - struct btrfs_io_bio *io_bio, - int icsum, struct page *page, - int pgoff, u64 start, size_t len) +static int check_data_csum(struct inode *inode, struct btrfs_io_bio *io_bio, + int icsum, struct page *page, int pgoff, u64 start, + size_t len) { struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); SHASH_DESC_ON_STACK(shash, fs_info->csum_shash); @@ -2790,8 +2789,8 @@ static int btrfs_readpage_end_io_hook(struct btrfs_io_bio *io_bio, } phy_offset >>= inode->i_sb->s_blocksize_bits; - return __readpage_endio_check(inode, io_bio, phy_offset, page, offset, - start, (size_t)(end - start + 1)); + return check_data_csum(inode, io_bio, phy_offset, page, offset, start, + (size_t)(end - start + 1)); } /* @@ -7584,9 +7583,9 @@ static void btrfs_retry_endio(struct bio *bio) ASSERT(!bio_flagged(bio, BIO_CLONED)); bio_for_each_segment_all(bvec, bio, iter_all) { - ret = __readpage_endio_check(inode, io_bio, i, bvec->bv_page, - bvec->bv_offset, done->start, - bvec->bv_len); + ret = check_data_csum(inode, io_bio, i, bvec->bv_page, + bvec->bv_offset, done->start, + bvec->bv_len); if (!ret) clean_io_failure(BTRFS_I(inode)->root->fs_info, failure_tree, io_tree, done->start, @@ -7636,8 +7635,9 @@ static blk_status_t __btrfs_subio_endio_read(struct inode *inode, next_block: if (uptodate) { csum_pos = BTRFS_BYTES_TO_BLKS(fs_info, offset); - ret = __readpage_endio_check(inode, io_bio, csum_pos, - bvec.bv_page, pgoff, start, sectorsize); + ret = check_data_csum(inode, io_bio, csum_pos, + bvec.bv_page, pgoff, start, + sectorsize); if (likely(!ret)) goto next; } From patchwork Thu Apr 16 21:46:18 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Omar Sandoval X-Patchwork-Id: 11493807 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6DD7314DD for ; Thu, 16 Apr 2020 21:46:46 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 56F16221F7 for ; Thu, 16 Apr 2020 21:46:46 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=osandov-com.20150623.gappssmtp.com header.i=@osandov-com.20150623.gappssmtp.com header.b="nUrxtyk/" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728491AbgDPVqq (ORCPT ); Thu, 16 Apr 2020 17:46:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48896 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1728471AbgDPVqo (ORCPT ); Thu, 16 Apr 2020 17:46:44 -0400 Received: from mail-pg1-x544.google.com (mail-pg1-x544.google.com [IPv6:2607:f8b0:4864:20::544]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5D09FC061A0F for ; Thu, 16 Apr 2020 14:46:43 -0700 (PDT) Received: by mail-pg1-x544.google.com with SMTP id 188so18552pgj.13 for ; Thu, 16 Apr 2020 14:46:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=osandov-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=0bAwaqJo49nfZ89MyQ5vav/YmbVLjZ8t78bJXkRNkEo=; b=nUrxtyk/cWVOYK605qManYjhWmmDlpyhftoOr6VQg322UTWwHESDH/KUyaYD7pDiJI Gqpv+4w9r6MYq2Uyqz6xZo+RwoSAq8C6QkHtYCsNxdT55A6c7i+ZDK97IdQMZoL3jgZh VoDPghaNIJkxWRMgPxlEcVysaO6g+2O77pSjO4mmx1jpOV7/WiZKaa2jrPTsRo1ujtdd sGGS+pG5t/ZSRiuy7kLwXxP0BSc4ZwUVMhgp/QLSI97TwfkXyQ87XGgmmvggNxn0gMFM pfl/HjI/2e6Qr8TJYnS2TXe22Op3K77LmbmHdhPBxTk9UzowBzvi210NDgzsE3qRNRln 0PYw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=0bAwaqJo49nfZ89MyQ5vav/YmbVLjZ8t78bJXkRNkEo=; b=QalYm9FXoZyLZqEKPsBQsQh3Qp2GJnWReWMZR73hzC3lReXvehtUCn2q+MyzT7tjYq +kCMXQF+op+xJdSXbyRbivaQB8K7sLMnSXcvHzn/UHgd0HsEF0+3WlG25LpfVyYj7lvt KEfZaTNtTpsc9i5KalufnYNd/JS7l3rrw2lsNS2DE0jjE0eQ/TJRqelLBHqzf76VHh0/ kqoFgPYfAcbQJIzr23SJyoOFIeyVygbhg4QvmS9KyORqldwqiiOtZmmBTfJxP2iwcBhD KaEXLIMlzEff0vrzPcYJO/uVLJPCgPiqaKvkcZC+rkXzgNZNjbbmZiUu0BZejJuBzz9T hOqg== X-Gm-Message-State: AGi0PubPiNmLGpalWwI6Eqig/ABJa6C0NLtRkXOJvvwZQ6gq3DuR6tRV puUpAW0qqMdQctRoBxHXiPhFmk0eTT0= X-Google-Smtp-Source: APiQypKWtUd1Wx+cLpg9Qn8ML1MuU4fHTe7X8ka8sFztKDudai0mpg1Ss69iRMOkkxN2a187Y6ZrpA== X-Received: by 2002:a63:c34e:: with SMTP id e14mr34064356pgd.212.1587073602620; Thu, 16 Apr 2020 14:46:42 -0700 (PDT) Received: from vader.tfbnw.net ([2620:10d:c090:400::5:844e]) by smtp.gmail.com with ESMTPSA id 17sm12440228pgg.76.2020.04.16.14.46.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 16 Apr 2020 14:46:42 -0700 (PDT) From: Omar Sandoval To: linux-btrfs@vger.kernel.org Cc: kernel-team@fb.com, Jens Axboe , Christoph Hellwig Subject: [PATCH v2 08/15] btrfs: make btrfs_check_repairable() static Date: Thu, 16 Apr 2020 14:46:18 -0700 Message-Id: <1fb537b8b19364290de431a48a243800ce930c3b.1587072977.git.osandov@fb.com> X-Mailer: git-send-email 2.26.1 In-Reply-To: References: MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Omar Sandoval Since its introduction in commit 2fe6303e7cd0 ("Btrfs: split bio_readpage_error into several functions"), btrfs_check_repairable() has only been used from extent_io.c where it is defined. Reviewed-by: Nikolay Borisov Reviewed-by: Josef Bacik Reviewed-by: Johannes Thumshirn Signed-off-by: Omar Sandoval --- fs/btrfs/extent_io.c | 7 ++++--- fs/btrfs/extent_io.h | 3 --- 2 files changed, 4 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 25dd42437cbd..85e98ba349a8 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2537,9 +2537,10 @@ int btrfs_get_io_failure_record(struct inode *inode, u64 start, u64 end, return 0; } -bool btrfs_check_repairable(struct inode *inode, bool need_validation, - struct io_failure_record *failrec, - int failed_mirror) +static bool btrfs_check_repairable(struct inode *inode, + bool need_validation, + struct io_failure_record *failrec, + int failed_mirror) { struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); int num_copies; diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index 26c0fce0bb64..f4dfac756455 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -312,9 +312,6 @@ struct io_failure_record { }; -bool btrfs_check_repairable(struct inode *inode, bool need_validation, - struct io_failure_record *failrec, - int failed_mirror); struct bio *btrfs_create_repair_bio(struct inode *inode, struct bio *failed_bio, struct io_failure_record *failrec, struct page *page, int pg_offset, int icsum, From patchwork Thu Apr 16 21:46:19 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Omar Sandoval X-Patchwork-Id: 11493809 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 33C4214DD for ; Thu, 16 Apr 2020 21:46:47 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1C7E9221F7 for ; Thu, 16 Apr 2020 21:46:47 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=osandov-com.20150623.gappssmtp.com header.i=@osandov-com.20150623.gappssmtp.com header.b="St/lV/oA" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728483AbgDPVqp (ORCPT ); Thu, 16 Apr 2020 17:46:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48904 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1728101AbgDPVqo (ORCPT ); Thu, 16 Apr 2020 17:46:44 -0400 Received: from mail-pj1-x1031.google.com (mail-pj1-x1031.google.com [IPv6:2607:f8b0:4864:20::1031]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 92D87C061A0C for ; Thu, 16 Apr 2020 14:46:44 -0700 (PDT) Received: by mail-pj1-x1031.google.com with SMTP id b7so139717pju.0 for ; Thu, 16 Apr 2020 14:46:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=osandov-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=yPZuR72F02rQGqhEWVqzRAKfTQzb0eGqbphi6uIjTj4=; b=St/lV/oAi7duAF6usLQZ1lN3EhwvJoOlk0MxBTA0tsMbrrCC/D0aJ4SvIo/tCg9wW0 CUiEGbvfXv6Mbmo4lm/E3h3E8s91aPok/NauvyRl8OLHgeZu2c0HfMEH3aYy/6tJs1we U56gL9whcThiE6LAnlAhsZzxmSEAJTbegscL0Y/T7FNkoUzNpnJOCNhNqZraGXaRyVwt 4+Zfy5XgywL4rljXs4vOL4ZJknyBBo2Fu7uFCL7arbXnmGzMNPdeSrrFPJoqj4WsEFVi Qz5z+Kg3y25hYOueWES4bcZelak0pv5WyP0VV7Z6YAlqTagPdnMZCKSlNGa86EqxHL9l hjdQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=yPZuR72F02rQGqhEWVqzRAKfTQzb0eGqbphi6uIjTj4=; b=KN6mdHjAa1BRXLjjSDkejYnXvAR2M0wX8urdnnrhs+Ed8oGH4cEVKp3QmpRolU+Cb+ Rz7vUrPUHWz8XObRZKpC6BP2vaEw4oEGqcawc7ZIDuw7EVB31SZtKBY4tUPsfbgRT7+P PO0NJdo2OYUFX8VqqcwTtKfbE3ccjMYCainhcburZ5ID8/F8N/wOUMlK3hqcs7J8go9M UGY+ammgHuz+03/DsMDGanTsGRz29QfuZCcTGkgCwkE8NfAxqjFXvh6DmUb49Q2aDnQ+ si7lEptNVlAa06EbScQQPVk4gFxuJ9M6lkQDpHc3L74T5knLGWOBDpQmB6C/HHfruji/ TRfw== X-Gm-Message-State: AGi0PuZHZrKf+zQbKA3css1DrNenmuLa3+VZ0bXTBTMayn7+HvMC8+bN +D0f+awOZhYpfjtxiZH+nPhLSeklCnc= X-Google-Smtp-Source: APiQypLYPwwQPDZOTvZigY2h6B9yJEBvewmAhMVcWsq7aA1i8kPe9NcU/W9XUlWF4C7c8kueXdZ+mw== X-Received: by 2002:a17:90a:6f22:: with SMTP id d31mr494776pjk.14.1587073603735; Thu, 16 Apr 2020 14:46:43 -0700 (PDT) Received: from vader.tfbnw.net ([2620:10d:c090:400::5:844e]) by smtp.gmail.com with ESMTPSA id 17sm12440228pgg.76.2020.04.16.14.46.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 16 Apr 2020 14:46:43 -0700 (PDT) From: Omar Sandoval To: linux-btrfs@vger.kernel.org Cc: kernel-team@fb.com, Jens Axboe , Christoph Hellwig Subject: [PATCH v2 09/15] btrfs: kill btrfs_dio_private->private Date: Thu, 16 Apr 2020 14:46:19 -0700 Message-Id: <2b1b2a0790cad01da9887392c25e4f9b6bbf1b5c.1587072977.git.osandov@fb.com> X-Mailer: git-send-email 2.26.1 In-Reply-To: References: MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Omar Sandoval We haven't used this since commit 9be3395bcd4a ("Btrfs: use a btrfs bioset instead of abusing bio internals"). Reviewed-by: Nikolay Borisov Reviewed-by: Josef Bacik Signed-off-by: Omar Sandoval Reviewed-by: Johannes Thumshirn --- fs/btrfs/btrfs_inode.h | 1 - fs/btrfs/inode.c | 1 - 2 files changed, 2 deletions(-) diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h index 27a1fefce508..ad36685ce046 100644 --- a/fs/btrfs/btrfs_inode.h +++ b/fs/btrfs/btrfs_inode.h @@ -301,7 +301,6 @@ struct btrfs_dio_private { u64 logical_offset; u64 disk_bytenr; u64 bytes; - void *private; /* number of bios pending for this dio */ atomic_t pending_bios; diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index eb3fcdc7c337..d7cf248dd634 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -7923,7 +7923,6 @@ static struct btrfs_dio_private *btrfs_create_dio_private(struct bio *dio_bio, bio->bi_private = dip; btrfs_io_bio(bio)->logical = file_offset; - dip->private = dio_bio->bi_private; dip->inode = inode; dip->logical_offset = file_offset; dip->bytes = dio_bio->bi_iter.bi_size; From patchwork Thu Apr 16 21:46:20 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Omar Sandoval X-Patchwork-Id: 11493811 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7A90E6CA for ; Thu, 16 Apr 2020 21:46:48 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 62E78221F7 for ; Thu, 16 Apr 2020 21:46:48 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=osandov-com.20150623.gappssmtp.com header.i=@osandov-com.20150623.gappssmtp.com header.b="Y8UoUhVD" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728501AbgDPVqr (ORCPT ); Thu, 16 Apr 2020 17:46:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48908 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1728471AbgDPVqq (ORCPT ); Thu, 16 Apr 2020 17:46:46 -0400 Received: from mail-pg1-x543.google.com (mail-pg1-x543.google.com [IPv6:2607:f8b0:4864:20::543]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B1357C061A0C for ; Thu, 16 Apr 2020 14:46:45 -0700 (PDT) Received: by mail-pg1-x543.google.com with SMTP id g6so29306pgs.9 for ; Thu, 16 Apr 2020 14:46:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=osandov-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=vY5HUibLoi3EGBd4zQyj1Rq2qF7F5ToVSUzWu/dewGw=; b=Y8UoUhVDLvqBKG0Aq8CceeSgZs/NovicrLE0G2m2rzv+2/lN5BF70fuAEr/L7yAMGD U1eGUHyOsCWtbOHGYlFEL3VehitpglPIo7EkyMXZtWPNAd7jLFpGHvJRnOINUjeeVGzZ 9QJApgNPgaBymNEm/5p9mLXCQzuFXVX58pg7zbyw3+ockBj2EkLyAvX8FVAJplo0TaDh ZUkuzRVhE7agJ2fEtaqr/KteYtUoBkWtRzL3klewel5g7pryLAQO7iKiGBLJ3tnYkGGc q+XfWcNvrvb+ilgg4KjTSUyVfGOp+JuZlbZa/TLDSKzS1dHT2TYbECcyNrbvK9tZLnXM 4rKQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=vY5HUibLoi3EGBd4zQyj1Rq2qF7F5ToVSUzWu/dewGw=; b=q/lmWWV4ceuQe2+oj3HSTaVoyoWReXck9W1zzWxwl/XarePtIelyWAiYoJZsxKZLlO uamJpGGnwJUX2J5MN5weMF0CWkh0QHuGNNP57H7mElWs4bfVwnBjzQNcpBj4SmY9xxZy cznbrxjZYi6ztbyOyld5ufgYxE578Sh+F+Dv/kiOt8jdfDQABjv14GvA1MtHflDhr8LH 6/og4OlT+O+Bn6q41OeuFioLYg/hERhtDQTqzjcLgCu3jl4MLmEaxMAMJZaUFMtQja5S bM+ADMHvBkkQfNhiUrbBg38IBvhSGHVjFiPQryqpM/RnFjf4HT9tY0liiOyAE3QXoQ/S CiZw== X-Gm-Message-State: AGi0PuZbWWEPSif3fsjjZFl25A2ZXWvobxpfpsnDl73wspIXO0xHmVvk UiIBUoUnMMJhJpAnFuzBgHsjAYdkuuQ= X-Google-Smtp-Source: APiQypIa/tXRzeuL2YDa6bSPT337H6JymrCYogw1aph58TPbgraC/lKduDngcdLlk70mSqJRKrXdNA== X-Received: by 2002:a63:cf4e:: with SMTP id b14mr34006297pgj.344.1587073604892; Thu, 16 Apr 2020 14:46:44 -0700 (PDT) Received: from vader.tfbnw.net ([2620:10d:c090:400::5:844e]) by smtp.gmail.com with ESMTPSA id 17sm12440228pgg.76.2020.04.16.14.46.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 16 Apr 2020 14:46:44 -0700 (PDT) From: Omar Sandoval To: linux-btrfs@vger.kernel.org Cc: kernel-team@fb.com, Jens Axboe , Christoph Hellwig Subject: [PATCH v2 10/15] btrfs: convert btrfs_dio_private->pending_bios to refcount_t Date: Thu, 16 Apr 2020 14:46:20 -0700 Message-Id: <14a8c9acc19ad08c31615616d007cc23e70ae0d0.1587072977.git.osandov@fb.com> X-Mailer: git-send-email 2.26.1 In-Reply-To: References: MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Omar Sandoval This is really a reference count now, so convert it to refcount_t and rename it to refs. Reviewed-by: Nikolay Borisov Reviewed-by: Josef Bacik Signed-off-by: Omar Sandoval Reviewed-by: Johannes Thumshirn --- fs/btrfs/btrfs_inode.h | 8 ++++++-- fs/btrfs/inode.c | 10 +++++----- 2 files changed, 11 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h index ad36685ce046..b965fa5429ec 100644 --- a/fs/btrfs/btrfs_inode.h +++ b/fs/btrfs/btrfs_inode.h @@ -7,6 +7,7 @@ #define BTRFS_INODE_H #include +#include #include "extent_map.h" #include "extent_io.h" #include "ordered-data.h" @@ -302,8 +303,11 @@ struct btrfs_dio_private { u64 disk_bytenr; u64 bytes; - /* number of bios pending for this dio */ - atomic_t pending_bios; + /* + * References to this structure. There is one reference per in-flight + * bio plus one while we're still setting up. + */ + refcount_t refs; /* IO errors */ int errors; diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index d7cf248dd634..4b1102f2e6b8 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -7811,7 +7811,7 @@ static void btrfs_end_dio_bio(struct bio *bio) } /* if there are more bios still pending for this dio, just exit */ - if (!atomic_dec_and_test(&dip->pending_bios)) + if (!refcount_dec_and_test(&dip->refs)) goto out; if (dip->errors) { @@ -7929,7 +7929,7 @@ static struct btrfs_dio_private *btrfs_create_dio_private(struct bio *dio_bio, dip->disk_bytenr = (u64)dio_bio->bi_iter.bi_sector << 9; dip->orig_bio = bio; dip->dio_bio = dio_bio; - atomic_set(&dip->pending_bios, 1); + refcount_set(&dip->refs, 1); if (write) { struct btrfs_dio_data *dio_data = current->journal_info; @@ -8021,13 +8021,13 @@ static void btrfs_submit_direct(struct bio *dio_bio, struct inode *inode, * count. Otherwise, the dip might get freed before we're * done setting it up. */ - atomic_inc(&dip->pending_bios); + refcount_inc(&dip->refs); status = btrfs_submit_dio_bio(bio, inode, file_offset, async_submit); if (status) { bio_put(bio); - atomic_dec(&dip->pending_bios); + refcount_dec(&dip->refs); goto out_err; } @@ -8056,7 +8056,7 @@ static void btrfs_submit_direct(struct bio *dio_bio, struct inode *inode, * atomic operations with a return value are fully ordered as per * atomic_t.txt */ - if (atomic_dec_and_test(&dip->pending_bios)) + if (refcount_dec_and_test(&dip->refs)) bio_io_error(dip->orig_bio); } From patchwork Thu Apr 16 21:46:21 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Omar Sandoval X-Patchwork-Id: 11493815 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C99EE14DD for ; Thu, 16 Apr 2020 21:46:50 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id ADC64221F7 for ; Thu, 16 Apr 2020 21:46:50 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=osandov-com.20150623.gappssmtp.com header.i=@osandov-com.20150623.gappssmtp.com header.b="cl8+w/2g" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728516AbgDPVqt (ORCPT ); Thu, 16 Apr 2020 17:46:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48912 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1728471AbgDPVqs (ORCPT ); Thu, 16 Apr 2020 17:46:48 -0400 Received: from mail-pj1-x1042.google.com (mail-pj1-x1042.google.com [IPv6:2607:f8b0:4864:20::1042]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BC760C061A0F for ; Thu, 16 Apr 2020 14:46:46 -0700 (PDT) Received: by mail-pj1-x1042.google.com with SMTP id t40so129404pjb.3 for ; Thu, 16 Apr 2020 14:46:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=osandov-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=fkubIkbm5CTH8PCM726ZLHiYyoY1I6Mws16fl26l+jU=; b=cl8+w/2g60vEsnsozNZlsix+djwjbQ1ywtXLsKFeSORay6MA7kh8efCiTpcihFiCKz OYebUQ/An6PQKzHYPlcG+kptDA9SzXKnanUYIkBRqPMODvRCQ6rQ4DsUgXCdes85PHTE HrypErUmHEqbdXhKrweZ63RWlko5TiYIsSbnffQL4OEPUKCXq2o4kw3wy7Ny00dI39lp y4Arklsn4rJfgt4DHB8sgIqQlKbW6McY2SbmAWwkjdni6/CjkJhu+46tuM/O+0WTlxXA PDVjEDTBDBAs7RQ8YCIcVSKNqKauzhmz0D2htln39FNOnaPXmIykmw7J9RYAQtW7payB iwxA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=fkubIkbm5CTH8PCM726ZLHiYyoY1I6Mws16fl26l+jU=; b=ep7iMRXE/JtgznAbBo3sOVo9cUv825CkbekuQ3vA//VHvIiIKsUwFUm906rCak5akd KqgyOFQY3le5Lv8eGv3Ei12qNbhs60lJC586sGLCFr0+6hs24SnC9/q0RF84PNsAixib h5c/tYZzgTs1zPKsbp5voxStRD1d3ztYDk6m8uVlm1fOXV+Cv69ql5YllSiWxYIm8d4r Szleb/rpleBzdV4cOKk7bzyT+Qu6iFyLNhyevBYADW4yfmzF9oDoWUOxtqiO84PSvPKw KdXi7jqYJSpuWfImF8/zi5t9df5y0d/VLjHh70md52icWjIsZYlQ1diI70yz/zS9MvAw +NWA== X-Gm-Message-State: AGi0PubzAtSHS/tqcFUwvsLZwlz1on0VdKTf9Fu/0X3lMW66IkGng96+ k8Qt3J9mlsobYnpWwdH/JXTsAIRUDuA= X-Google-Smtp-Source: APiQypJ2Oj4lYcsrbalIEeOM7G4R0nkmqwLCAsAfX9qj0tbEUzVsCoyft6UDjy6IUZPCSOdFtU22RQ== X-Received: by 2002:a17:90a:3268:: with SMTP id k95mr395975pjb.185.1587073605918; Thu, 16 Apr 2020 14:46:45 -0700 (PDT) Received: from vader.tfbnw.net ([2620:10d:c090:400::5:844e]) by smtp.gmail.com with ESMTPSA id 17sm12440228pgg.76.2020.04.16.14.46.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 16 Apr 2020 14:46:45 -0700 (PDT) From: Omar Sandoval To: linux-btrfs@vger.kernel.org Cc: kernel-team@fb.com, Jens Axboe , Christoph Hellwig Subject: [PATCH v2 11/15] btrfs: put direct I/O checksums in btrfs_dio_private instead of bio Date: Thu, 16 Apr 2020 14:46:21 -0700 Message-Id: X-Mailer: git-send-email 2.26.1 In-Reply-To: References: MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Omar Sandoval The next commit will get rid of btrfs_dio_private->orig_bio. The only thing we really need it for is containing all of the checksums, but we can easily put the checksum array in btrfs_dio_private and have the submitted bios reference the array. We can also look the checksums up while we're setting up instead of the current awkward logic that looks them up for orig_bio when the first split bio is submitted. (Interestingly, btrfs_dio_private did contain the checksums before commit 23ea8e5a0767 ("Btrfs: load checksum data once when submitting a direct read io"), but it didn't look them up up front.) Reviewed-by: Nikolay Borisov Reviewed-by: Josef Bacik Signed-off-by: Omar Sandoval Reviewed-by: Johannes Thumshirn --- fs/btrfs/btrfs_inode.h | 3 ++ fs/btrfs/inode.c | 70 +++++++++++++++++++----------------------- 2 files changed, 34 insertions(+), 39 deletions(-) diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h index b965fa5429ec..94476a8be4cc 100644 --- a/fs/btrfs/btrfs_inode.h +++ b/fs/btrfs/btrfs_inode.h @@ -324,6 +324,9 @@ struct btrfs_dio_private { */ blk_status_t (*subio_endio)(struct inode *, struct btrfs_io_bio *, blk_status_t); + + /* Checksums. */ + u8 sums[]; }; /* diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 4b1102f2e6b8..fe87c465b13c 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -7712,7 +7712,6 @@ static void btrfs_endio_direct_read(struct bio *bio) dio_bio->bi_status = err; dio_end_io(dio_bio); - btrfs_io_bio_free_csum(io_bio); bio_put(bio); } @@ -7824,39 +7823,6 @@ static void btrfs_end_dio_bio(struct bio *bio) bio_put(bio); } -static inline blk_status_t btrfs_lookup_and_bind_dio_csum(struct inode *inode, - struct btrfs_dio_private *dip, - struct bio *bio, - u64 file_offset) -{ - struct btrfs_io_bio *io_bio = btrfs_io_bio(bio); - struct btrfs_io_bio *orig_io_bio = btrfs_io_bio(dip->orig_bio); - u16 csum_size; - blk_status_t ret; - - /* - * We load all the csum data we need when we submit - * the first bio to reduce the csum tree search and - * contention. - */ - if (dip->logical_offset == file_offset) { - ret = btrfs_lookup_bio_sums(inode, dip->orig_bio, file_offset, - NULL); - if (ret) - return ret; - } - - if (bio == dip->orig_bio) - return 0; - - file_offset -= dip->logical_offset; - file_offset >>= inode->i_sb->s_blocksize_bits; - csum_size = btrfs_super_csum_size(btrfs_sb(inode->i_sb)->super_copy); - io_bio->csum = orig_io_bio->csum + csum_size * file_offset; - - return 0; -} - static inline blk_status_t btrfs_submit_dio_bio(struct bio *bio, struct inode *inode, u64 file_offset, int async_submit) { @@ -7892,10 +7858,12 @@ static inline blk_status_t btrfs_submit_dio_bio(struct bio *bio, if (ret) goto err; } else { - ret = btrfs_lookup_and_bind_dio_csum(inode, dip, bio, - file_offset); - if (ret) - goto err; + u64 csum_offset; + + csum_offset = file_offset - dip->logical_offset; + csum_offset >>= inode->i_sb->s_blocksize_bits; + csum_offset *= btrfs_super_csum_size(fs_info->super_copy); + btrfs_io_bio(bio)->csum = dip->sums + csum_offset; } map: ret = btrfs_map_bio(fs_info, bio, 0); @@ -7912,10 +7880,22 @@ static struct btrfs_dio_private *btrfs_create_dio_private(struct bio *dio_bio, loff_t file_offset) { const bool write = (bio_op(dio_bio) == REQ_OP_WRITE); + const bool csum = !(BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM); + size_t dip_size; struct btrfs_dio_private *dip; struct bio *bio; - dip = kzalloc(sizeof(*dip), GFP_NOFS); + dip_size = sizeof(*dip); + if (!write && csum) { + struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); + u16 csum_size = btrfs_super_csum_size(fs_info->super_copy); + size_t nblocks; + + nblocks = dio_bio->bi_iter.bi_size >> inode->i_sb->s_blocksize_bits; + dip_size += csum_size * nblocks; + } + + dip = kzalloc(dip_size, GFP_NOFS); if (!dip) return NULL; @@ -7951,6 +7931,7 @@ static void btrfs_submit_direct(struct bio *dio_bio, struct inode *inode, loff_t file_offset) { const bool write = (bio_op(dio_bio) == REQ_OP_WRITE); + const bool csum = !(BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM); struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); struct btrfs_dio_private *dip; struct bio *bio; @@ -7975,6 +7956,17 @@ static void btrfs_submit_direct(struct bio *dio_bio, struct inode *inode, return; } + if (!write && csum) { + /* + * Load the csums up front to reduce csum tree searches and + * contention when submitting bios. + */ + status = btrfs_lookup_bio_sums(inode, dio_bio, file_offset, + dip->sums); + if (status != BLK_STS_OK) + goto out_err; + } + orig_bio = dip->orig_bio; start_sector = orig_bio->bi_iter.bi_sector; submit_len = orig_bio->bi_iter.bi_size; From patchwork Thu Apr 16 21:46:22 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Omar Sandoval X-Patchwork-Id: 11493813 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F1AC714DD for ; Thu, 16 Apr 2020 21:46:49 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CFAE2221F7 for ; Thu, 16 Apr 2020 21:46:49 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=osandov-com.20150623.gappssmtp.com header.i=@osandov-com.20150623.gappssmtp.com header.b="GxdvYJ6M" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728525AbgDPVqt (ORCPT ); Thu, 16 Apr 2020 17:46:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48916 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1728502AbgDPVqs (ORCPT ); Thu, 16 Apr 2020 17:46:48 -0400 Received: from mail-pl1-x644.google.com (mail-pl1-x644.google.com [IPv6:2607:f8b0:4864:20::644]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1E66EC061A0C for ; Thu, 16 Apr 2020 14:46:48 -0700 (PDT) Received: by mail-pl1-x644.google.com with SMTP id y22so128978pll.4 for ; Thu, 16 Apr 2020 14:46:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=osandov-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=M3okAhFxQRqXu9Xbi5lkgf8F20SY8dnh0Hv+G7Nc7qg=; b=GxdvYJ6MXkdjZPnvhbAEePNkb92RXnhHJ/557YokFEye3f0QKzJzxXKDp2d0OicFLE MXUgPcT0c5J5sC9HzYBkeURfoo1CUOo0/s9sQzD+DjFYl+cN+2OTj7VrHHBR6wKPgEjH Qo0AAcCruCJUJG8yGDA6FmUhdfaJI+8iXYsI4o9jgy04wnAu+QhCUoqu2CLLFWGfCx9H LGkDsblgZ5Sp1stbiUk+lAd4WA/sI+cfLqtvOvkk9YwAIxrUaF8IOmptzITgVhV2m4al BJxpxFqqgvGwh6yFBdJoc4VYfHmXLzJD/3h1I7SvLN2fhMveFvuw4swk2xo7QMhXEkUH /Z/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=M3okAhFxQRqXu9Xbi5lkgf8F20SY8dnh0Hv+G7Nc7qg=; b=ZQyhBdWET5+vf39IxeuAxXNEoaoDbh3FdEX0X1sPMnUP2A7k3Un+FL1uVQQ1z0+F9Z vg8GrdwvE5EnP8NPI8griNOdjnjELuj/wKOd1p9Mm3pF8NIK6IN7QnzXLc+XLM6iSjPY qDNxVAefXdR96UXoqDKk9yaEqKKrPpkRapZKET+77ErRra+XWrl6QFIMRx5A7AmuIzuD CW0MBssS6Va9tJVcqFTScP+ckXwHzVJ5SRg5V8ZpOelNTsf/jqLqgdj6thv0e9vWgrVW BojhvBebBgXIwyeVEqShFjMKhrhaYR1GzeMSMN0B8LlwP9pc4FqdqhqSeTO8dnsrrpcw +Lew== X-Gm-Message-State: AGi0PuYVeYtMuA3z/2v1cC0EcxQyY6u7g8x2H/JFjrOlipBG8DE/+och +mjyqF4DAZv+4AB4K2PeLlk8SDCXH6U= X-Google-Smtp-Source: APiQypL8/P8d9rM7VRBSOZ2vplvsX/Bn1ZmQgSVEQmhK0Eb+loUgtkD0NwlUFTfyzzt7uaz0bdFF5g== X-Received: by 2002:a17:90a:9f0b:: with SMTP id n11mr374330pjp.99.1587073607060; Thu, 16 Apr 2020 14:46:47 -0700 (PDT) Received: from vader.tfbnw.net ([2620:10d:c090:400::5:844e]) by smtp.gmail.com with ESMTPSA id 17sm12440228pgg.76.2020.04.16.14.46.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 16 Apr 2020 14:46:46 -0700 (PDT) From: Omar Sandoval To: linux-btrfs@vger.kernel.org Cc: kernel-team@fb.com, Jens Axboe , Christoph Hellwig Subject: [PATCH v2 12/15] btrfs: get rid of one layer of bios in direct I/O Date: Thu, 16 Apr 2020 14:46:22 -0700 Message-Id: X-Mailer: git-send-email 2.26.1 In-Reply-To: References: MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Omar Sandoval In the worst case, there are _4_ layers of bios in the Btrfs direct I/O path: 1. The bio created by the generic direct I/O code (dio_bio). 2. A clone of dio_bio we create in btrfs_submit_direct() to represent the entire direct I/O range (orig_bio). 3. A partial clone of orig_bio limited to the size of a RAID stripe that we create in btrfs_submit_direct_hook(). 4. Clones of each of those split bios for each RAID stripe that we create in btrfs_map_bio(). As of the previous commit, the second layer (orig_bio) is no longer needed for anything: we can split dio_bio instead, and complete dio_bio directly when all of the cloned bios complete. This lets us clean up a bunch of cruft, including dip->subio_endio and dip->errors (we can use dio_bio->bi_status instead). It also enables the next big cleanup of direct I/O read repair. Reviewed-by: Josef Bacik Signed-off-by: Omar Sandoval --- fs/btrfs/btrfs_inode.h | 16 ---- fs/btrfs/inode.c | 185 +++++++++++++++-------------------------- 2 files changed, 65 insertions(+), 136 deletions(-) diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h index 94476a8be4cc..3fb2f5a11ee3 100644 --- a/fs/btrfs/btrfs_inode.h +++ b/fs/btrfs/btrfs_inode.h @@ -294,11 +294,8 @@ static inline int btrfs_inode_in_log(struct btrfs_inode *inode, u64 generation) return ret; } -#define BTRFS_DIO_ORIG_BIO_SUBMITTED 0x1 - struct btrfs_dio_private { struct inode *inode; - unsigned long flags; u64 logical_offset; u64 disk_bytenr; u64 bytes; @@ -309,22 +306,9 @@ struct btrfs_dio_private { */ refcount_t refs; - /* IO errors */ - int errors; - - /* orig_bio is our btrfs_io_bio */ - struct bio *orig_bio; - /* dio_bio came from fs/direct-io.c */ struct bio *dio_bio; - /* - * The original bio may be split to several sub-bios, this is - * done during endio of sub-bios - */ - blk_status_t (*subio_endio)(struct inode *, struct btrfs_io_bio *, - blk_status_t); - /* Checksums. */ u8 sums[]; }; diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index fe87c465b13c..79b884d2f3ed 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -7356,6 +7356,29 @@ static int btrfs_get_blocks_direct(struct inode *inode, sector_t iblock, return ret; } +static void btrfs_dio_private_put(struct btrfs_dio_private *dip) +{ + /* + * This implies a barrier so that stores to dio_bio->bi_status before + * this and loads of dio_bio->bi_status after this are fully ordered. + */ + if (!refcount_dec_and_test(&dip->refs)) + return; + + if (bio_op(dip->dio_bio) == REQ_OP_WRITE) { + __endio_write_update_ordered(dip->inode, dip->logical_offset, + dip->bytes, + !dip->dio_bio->bi_status); + } else { + unlock_extent(&BTRFS_I(dip->inode)->io_tree, + dip->logical_offset, + dip->logical_offset + dip->bytes - 1); + } + + dio_end_io(dip->dio_bio); + kfree(dip); +} + static inline blk_status_t submit_dio_repair_bio(struct inode *inode, struct bio *bio, int mirror_num) @@ -7678,8 +7701,9 @@ static blk_status_t __btrfs_subio_endio_read(struct inode *inode, return err; } -static blk_status_t btrfs_subio_endio_read(struct inode *inode, - struct btrfs_io_bio *io_bio, blk_status_t err) +static blk_status_t btrfs_check_read_dio_bio(struct inode *inode, + struct btrfs_io_bio *io_bio, + blk_status_t err) { bool skip_csum = BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM; @@ -7693,28 +7717,6 @@ static blk_status_t btrfs_subio_endio_read(struct inode *inode, } } -static void btrfs_endio_direct_read(struct bio *bio) -{ - struct btrfs_dio_private *dip = bio->bi_private; - struct inode *inode = dip->inode; - struct bio *dio_bio; - struct btrfs_io_bio *io_bio = btrfs_io_bio(bio); - blk_status_t err = bio->bi_status; - - if (dip->flags & BTRFS_DIO_ORIG_BIO_SUBMITTED) - err = btrfs_subio_endio_read(inode, io_bio, err); - - unlock_extent(&BTRFS_I(inode)->io_tree, dip->logical_offset, - dip->logical_offset + dip->bytes - 1); - dio_bio = dip->dio_bio; - - kfree(dip); - - dio_bio->bi_status = err; - dio_end_io(dio_bio); - bio_put(bio); -} - static void __endio_write_update_ordered(struct inode *inode, const u64 offset, const u64 bytes, const bool uptodate) @@ -7758,21 +7760,6 @@ static void __endio_write_update_ordered(struct inode *inode, } } -static void btrfs_endio_direct_write(struct bio *bio) -{ - struct btrfs_dio_private *dip = bio->bi_private; - struct bio *dio_bio = dip->dio_bio; - - __endio_write_update_ordered(dip->inode, dip->logical_offset, - dip->bytes, !bio->bi_status); - - kfree(dip); - - dio_bio->bi_status = bio->bi_status; - dio_end_io(dio_bio); - bio_put(bio); -} - static blk_status_t btrfs_submit_bio_start_direct_io(void *private_data, struct bio *bio, u64 offset) { @@ -7796,31 +7783,16 @@ static void btrfs_end_dio_bio(struct bio *bio) (unsigned long long)bio->bi_iter.bi_sector, bio->bi_iter.bi_size, err); - if (dip->subio_endio) - err = dip->subio_endio(dip->inode, btrfs_io_bio(bio), err); - - if (err) { - /* - * We want to perceive the errors flag being set before - * decrementing the reference count. We don't need a barrier - * since atomic operations with a return value are fully - * ordered as per atomic_t.txt - */ - dip->errors = 1; + if (bio_op(bio) == REQ_OP_READ) { + err = btrfs_check_read_dio_bio(dip->inode, btrfs_io_bio(bio), + err); } - /* if there are more bios still pending for this dio, just exit */ - if (!refcount_dec_and_test(&dip->refs)) - goto out; + if (err) + dip->dio_bio->bi_status = err; - if (dip->errors) { - bio_io_error(dip->orig_bio); - } else { - dip->dio_bio->bi_status = BLK_STS_OK; - bio_endio(dip->orig_bio); - } -out: bio_put(bio); + btrfs_dio_private_put(dip); } static inline blk_status_t btrfs_submit_dio_bio(struct bio *bio, @@ -7883,7 +7855,6 @@ static struct btrfs_dio_private *btrfs_create_dio_private(struct bio *dio_bio, const bool csum = !(BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM); size_t dip_size; struct btrfs_dio_private *dip; - struct bio *bio; dip_size = sizeof(*dip); if (!write && csum) { @@ -7899,15 +7870,10 @@ static struct btrfs_dio_private *btrfs_create_dio_private(struct bio *dio_bio, if (!dip) return NULL; - bio = btrfs_bio_clone(dio_bio); - bio->bi_private = dip; - btrfs_io_bio(bio)->logical = file_offset; - dip->inode = inode; dip->logical_offset = file_offset; dip->bytes = dio_bio->bi_iter.bi_size; dip->disk_bytenr = (u64)dio_bio->bi_iter.bi_sector << 9; - dip->orig_bio = bio; dip->dio_bio = dio_bio; refcount_set(&dip->refs, 1); @@ -7918,11 +7884,6 @@ static struct btrfs_dio_private *btrfs_create_dio_private(struct bio *dio_bio, dip->bytes; dio_data->unsubmitted_oe_range_start = dio_data->unsubmitted_oe_range_end; - - bio->bi_end_io = btrfs_endio_direct_write; - } else { - bio->bi_end_io = btrfs_endio_direct_read; - dip->subio_endio = btrfs_subio_endio_read; } return dip; } @@ -7933,9 +7894,10 @@ static void btrfs_submit_direct(struct bio *dio_bio, struct inode *inode, const bool write = (bio_op(dio_bio) == REQ_OP_WRITE); const bool csum = !(BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM); struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); + const bool raid56 = (btrfs_data_alloc_profile(fs_info) & + BTRFS_BLOCK_GROUP_RAID56_MASK); struct btrfs_dio_private *dip; struct bio *bio; - struct bio *orig_bio; u64 start_sector; int async_submit = 0; u64 submit_len; @@ -7967,89 +7929,72 @@ static void btrfs_submit_direct(struct bio *dio_bio, struct inode *inode, goto out_err; } - orig_bio = dip->orig_bio; - start_sector = orig_bio->bi_iter.bi_sector; - submit_len = orig_bio->bi_iter.bi_size; - ret = btrfs_get_io_geometry(fs_info, btrfs_op(orig_bio), - start_sector << 9, submit_len, &geom); - if (ret) - goto out_err; + start_sector = dio_bio->bi_iter.bi_sector; + submit_len = dio_bio->bi_iter.bi_size; - if (geom.len >= submit_len) { - bio = orig_bio; - dip->flags |= BTRFS_DIO_ORIG_BIO_SUBMITTED; - goto submit; - } - - /* async crcs make it difficult to collect full stripe writes. */ - if (btrfs_data_alloc_profile(fs_info) & BTRFS_BLOCK_GROUP_RAID56_MASK) - async_submit = 0; - else - async_submit = 1; - - /* bio split */ - ASSERT(geom.len <= INT_MAX); do { + ret = btrfs_get_io_geometry(fs_info, btrfs_op(dio_bio), + start_sector << 9, submit_len, + &geom); + if (ret) { + status = errno_to_blk_status(ret); + goto out_err; + } + ASSERT(geom.len <= INT_MAX); + clone_len = min_t(int, submit_len, geom.len); /* * This will never fail as it's passing GPF_NOFS and * the allocation is backed by btrfs_bioset. */ - bio = btrfs_bio_clone_partial(orig_bio, clone_offset, - clone_len); + bio = btrfs_bio_clone_partial(dio_bio, clone_offset, clone_len); bio->bi_private = dip; bio->bi_end_io = btrfs_end_dio_bio; btrfs_io_bio(bio)->logical = file_offset; ASSERT(submit_len >= clone_len); submit_len -= clone_len; - if (submit_len == 0) - break; /* * Increase the count before we submit the bio so we know * the end IO handler won't happen before we increase the * count. Otherwise, the dip might get freed before we're * done setting it up. + * + * We transfer the initial reference to the last bio, so we + * don't need to increment the reference count for the last one. */ - refcount_inc(&dip->refs); + if (submit_len > 0) { + refcount_inc(&dip->refs); + /* + * If we are submitting more than one bio, submit them + * all asynchronously. The exception is RAID 5 or 6, as + * asynchronous checksums make it difficult to collect + * full stripe writes. + */ + if (!raid56) + async_submit = 1; + } status = btrfs_submit_dio_bio(bio, inode, file_offset, async_submit); if (status) { bio_put(bio); - refcount_dec(&dip->refs); + if (submit_len > 0) + refcount_dec(&dip->refs); goto out_err; } clone_offset += clone_len; start_sector += clone_len >> 9; file_offset += clone_len; - - ret = btrfs_get_io_geometry(fs_info, btrfs_op(orig_bio), - start_sector << 9, submit_len, &geom); - if (ret) - goto out_err; } while (submit_len > 0); + return; -submit: - status = btrfs_submit_dio_bio(bio, inode, file_offset, async_submit); - if (!status) - return; - - if (bio != orig_bio) - bio_put(bio); out_err: - dip->errors = 1; - /* - * Before atomic variable goto zero, we must make sure dip->errors is - * perceived to be set. This ordering is ensured by the fact that an - * atomic operations with a return value are fully ordered as per - * atomic_t.txt - */ - if (refcount_dec_and_test(&dip->refs)) - bio_io_error(dip->orig_bio); + dip->dio_bio->bi_status = status; + btrfs_dio_private_put(dip); } static ssize_t check_direct_IO(struct btrfs_fs_info *fs_info, From patchwork Thu Apr 16 21:46:23 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Omar Sandoval X-Patchwork-Id: 11493819 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6DA1E14DD for ; Thu, 16 Apr 2020 21:46:54 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 50B3A2220A for ; Thu, 16 Apr 2020 21:46:54 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=osandov-com.20150623.gappssmtp.com header.i=@osandov-com.20150623.gappssmtp.com header.b="ZLnoSzEQ" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728538AbgDPVqv (ORCPT ); Thu, 16 Apr 2020 17:46:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48922 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1728502AbgDPVqt (ORCPT ); Thu, 16 Apr 2020 17:46:49 -0400 Received: from mail-pj1-x1043.google.com (mail-pj1-x1043.google.com [IPv6:2607:f8b0:4864:20::1043]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 69007C061A0C for ; Thu, 16 Apr 2020 14:46:49 -0700 (PDT) Received: by mail-pj1-x1043.google.com with SMTP id a32so125185pje.5 for ; Thu, 16 Apr 2020 14:46:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=osandov-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=fBIBBXiDKsW6ibbp4jYvW0Cq/4xUMCc2deFJPEe5w0c=; b=ZLnoSzEQQVfT7ThtkDFKekNNQNfkwIb0GuV9Ei1KO3ReHrernHboGhsRZ84Ghfn1RS +AJL0OrFtiD0Sm1pzvpRSZJyC4AdprwL1P6uLsRfL59gXEicPhtXFNJq6Rzh4wYgl+DO 8WACjZjH8xRQZ/RK2Hjx2mRGcm78XOzz18sb1+EZAlVoxRAOAZtmkMgjeVtO/Fo0mmw5 ZGXJD+sL7hatNSJm2HSaK+FvH7294qsfSMgvJ/gtQTdRmNk+cfGnrILyP0qkwf58ake1 6lwZMuPwvd5NuCQBDDn6yuD42ehPIauQW/Ey5NmXOZM+vnib5aDvy2B+ZCLJWkFxXr7/ KXXw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=fBIBBXiDKsW6ibbp4jYvW0Cq/4xUMCc2deFJPEe5w0c=; b=XSR7E5rNqauF71cZqgyZVdU2fVNRk010cRDOk55VXYpKxdIG3WbOb0l25611nX0QZU BjFtbUeXSzizYT7SB8PG0o+US50EuP3XereKZe4eE8RjBLRB9FLz4FSqBBrMj2keuMwF sTutK/SirOGx85ZJMs+fr35y6LOd6+yQtuyxruhHPr0jSaEIus+fJ2LWUXQBcS9oRWw1 aRRDYqC6eAJG/uhy3bRJOE+IGeP5I1cVXgx9isUqjSeToZw2E9FgFif2UtVDackbaPOb cJ+Gz9ovB02b3dmlnCPg3PwDi+ORktT3+RD5LVfCNXDQXTkIWEP3S19m7YKWsyr99o9c aAuw== X-Gm-Message-State: AGi0PuYwRodlrsZrMBqJwE/7Hn8urJC8cz+rHcHCtOJBaZCudzdT7STk Z1Szz4a2gDcZbb8MsU0Y6TBa3FR9Cvw= X-Google-Smtp-Source: APiQypLFik1luT0aFMiF46+x0wQ2LtWr3p1vAKI/j8nkVZDya2AdMrlQ7SAAVe9PLkEIzlons90W+g== X-Received: by 2002:a17:90a:e2c1:: with SMTP id fr1mr471073pjb.124.1587073608250; Thu, 16 Apr 2020 14:46:48 -0700 (PDT) Received: from vader.tfbnw.net ([2620:10d:c090:400::5:844e]) by smtp.gmail.com with ESMTPSA id 17sm12440228pgg.76.2020.04.16.14.46.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 16 Apr 2020 14:46:47 -0700 (PDT) From: Omar Sandoval To: linux-btrfs@vger.kernel.org Cc: kernel-team@fb.com, Jens Axboe , Christoph Hellwig Subject: [PATCH v2 13/15] btrfs: simplify direct I/O read repair Date: Thu, 16 Apr 2020 14:46:23 -0700 Message-Id: X-Mailer: git-send-email 2.26.1 In-Reply-To: References: MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Omar Sandoval Direct I/O read repair was originally implemented in commit 8b110e393c5a ("Btrfs: implement repair function when direct read fails"). This implementation is unnecessarily complicated. There is major code duplication between __btrfs_subio_endio_read() (checks checksums and handles I/O errors for files with checksums), __btrfs_correct_data_nocsum() (handles I/O errors for files without checksums), btrfs_retry_endio() (checks checksums and handles I/O errors for retries of files with checksums), and btrfs_retry_endio_nocsum() (handles I/O errors for retries of files without checksum). If it sounds like these should be one function, that's because they should. Additionally, these functions are very hard to follow due to their excessive use of goto. This commit replaces the original implementation. After the previous commit getting rid of orig_bio, we can reuse the same endio callback for repair I/O and the original I/O, we just need to track the file offset and original iterator in the repair bio. We can also unify the handling of files with and without checksums and simplify the control flow. We also no longer have to wait for each repair I/O to complete one by one. Reviewed-by: Josef Bacik Signed-off-by: Omar Sandoval Reviewed-by: Nikolay Borisov --- fs/btrfs/extent_io.c | 2 + fs/btrfs/inode.c | 268 +++++++------------------------------------ 2 files changed, 44 insertions(+), 226 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 85e98ba349a8..6e1d97bb7652 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2631,6 +2631,8 @@ struct bio *btrfs_create_repair_bio(struct inode *inode, struct bio *failed_bio, } bio_add_page(bio, page, failrec->len, pg_offset); + btrfs_io_bio(bio)->logical = failrec->start; + btrfs_io_bio(bio)->iter = bio->bi_iter; return bio; } diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 79b884d2f3ed..2580f2d251d4 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -7435,19 +7435,17 @@ static int btrfs_check_dio_repairable(struct inode *inode, static blk_status_t dio_read_error(struct inode *inode, struct bio *failed_bio, struct page *page, unsigned int pgoff, - u64 start, u64 end, int failed_mirror, - bio_end_io_t *repair_endio, void *repair_arg) + u64 start, u64 end, int failed_mirror) { + struct btrfs_dio_private *dip = failed_bio->bi_private; struct io_failure_record *failrec; struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree; struct extent_io_tree *failure_tree = &BTRFS_I(inode)->io_failure_tree; struct bio *bio; int isector; unsigned int read_mode = 0; - int segs; int ret; blk_status_t status; - struct bio_vec bvec; BUG_ON(bio_op(failed_bio) == REQ_OP_WRITE); @@ -7462,261 +7460,79 @@ static blk_status_t dio_read_error(struct inode *inode, struct bio *failed_bio, return BLK_STS_IOERR; } - segs = bio_segments(failed_bio); - bio_get_first_bvec(failed_bio, &bvec); - if (segs > 1 || - (bvec.bv_len > btrfs_inode_sectorsize(inode))) + if (btrfs_io_bio(failed_bio)->iter.bi_size > inode->i_sb->s_blocksize) read_mode |= REQ_FAILFAST_DEV; isector = start - btrfs_io_bio(failed_bio)->logical; isector >>= inode->i_sb->s_blocksize_bits; - bio = btrfs_create_repair_bio(inode, failed_bio, failrec, page, - pgoff, isector, repair_endio, repair_arg); + bio = btrfs_create_repair_bio(inode, failed_bio, failrec, page, pgoff, + isector, failed_bio->bi_end_io, dip); bio->bi_opf = REQ_OP_READ | read_mode; btrfs_debug(BTRFS_I(inode)->root->fs_info, "repair DIO read error: submitting new dio read[%#x] to this_mirror=%d, in_validation=%d", read_mode, failrec->this_mirror, failrec->in_validation); + refcount_inc(&dip->refs); status = submit_dio_repair_bio(inode, bio, failrec->this_mirror); if (status) { free_io_failure(failure_tree, io_tree, failrec); bio_put(bio); + refcount_dec(&dip->refs); } return status; } -struct btrfs_retry_complete { - struct completion done; - struct inode *inode; - u64 start; - int uptodate; -}; - -static void btrfs_retry_endio_nocsum(struct bio *bio) -{ - struct btrfs_retry_complete *done = bio->bi_private; - struct inode *inode = done->inode; - struct bio_vec *bvec; - struct extent_io_tree *io_tree, *failure_tree; - struct bvec_iter_all iter_all; - - if (bio->bi_status) - goto end; - - ASSERT(bio->bi_vcnt == 1); - io_tree = &BTRFS_I(inode)->io_tree; - failure_tree = &BTRFS_I(inode)->io_failure_tree; - ASSERT(bio_first_bvec_all(bio)->bv_len == btrfs_inode_sectorsize(inode)); - - done->uptodate = 1; - ASSERT(!bio_flagged(bio, BIO_CLONED)); - bio_for_each_segment_all(bvec, bio, iter_all) - clean_io_failure(BTRFS_I(inode)->root->fs_info, failure_tree, - io_tree, done->start, bvec->bv_page, - btrfs_ino(BTRFS_I(inode)), 0); -end: - complete(&done->done); - bio_put(bio); -} - -static blk_status_t __btrfs_correct_data_nocsum(struct inode *inode, - struct btrfs_io_bio *io_bio) +static blk_status_t btrfs_check_read_dio_bio(struct inode *inode, + struct btrfs_io_bio *io_bio, + const bool uptodate) { - struct btrfs_fs_info *fs_info; + struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info; + u32 sectorsize = fs_info->sectorsize; + struct extent_io_tree *failure_tree = &BTRFS_I(inode)->io_failure_tree; + struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree; + const bool csum = !(BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM); struct bio_vec bvec; struct bvec_iter iter; - struct btrfs_retry_complete done; - u64 start; - unsigned int pgoff; - u32 sectorsize; - int nr_sectors; - blk_status_t ret; + u64 start = io_bio->logical; + int icsum = 0; blk_status_t err = BLK_STS_OK; - fs_info = BTRFS_I(inode)->root->fs_info; - sectorsize = fs_info->sectorsize; - - start = io_bio->logical; - done.inode = inode; - io_bio->bio.bi_iter = io_bio->iter; + __bio_for_each_segment(bvec, &io_bio->bio, iter, io_bio->iter) { + unsigned int i, nr_sectors, pgoff; - bio_for_each_segment(bvec, &io_bio->bio, iter) { nr_sectors = BTRFS_BYTES_TO_BLKS(fs_info, bvec.bv_len); pgoff = bvec.bv_offset; - -next_block_or_try_again: - done.uptodate = 0; - done.start = start; - init_completion(&done.done); - - ret = dio_read_error(inode, &io_bio->bio, bvec.bv_page, - pgoff, start, start + sectorsize - 1, - io_bio->mirror_num, - btrfs_retry_endio_nocsum, &done); - if (ret) { - err = ret; - goto next; - } - - wait_for_completion_io(&done.done); - - if (!done.uptodate) { - /* We might have another mirror, so try again */ - goto next_block_or_try_again; - } - -next: - start += sectorsize; - - nr_sectors--; - if (nr_sectors) { - pgoff += sectorsize; + for (i = 0; i < nr_sectors; i++) { ASSERT(pgoff < PAGE_SIZE); - goto next_block_or_try_again; - } - } - - return err; -} - -static void btrfs_retry_endio(struct bio *bio) -{ - struct btrfs_retry_complete *done = bio->bi_private; - struct btrfs_io_bio *io_bio = btrfs_io_bio(bio); - struct extent_io_tree *io_tree, *failure_tree; - struct inode *inode = done->inode; - struct bio_vec *bvec; - int uptodate; - int ret; - int i = 0; - struct bvec_iter_all iter_all; - - if (bio->bi_status) - goto end; - - uptodate = 1; - - ASSERT(bio->bi_vcnt == 1); - ASSERT(bio_first_bvec_all(bio)->bv_len == btrfs_inode_sectorsize(done->inode)); - - io_tree = &BTRFS_I(inode)->io_tree; - failure_tree = &BTRFS_I(inode)->io_failure_tree; - - ASSERT(!bio_flagged(bio, BIO_CLONED)); - bio_for_each_segment_all(bvec, bio, iter_all) { - ret = check_data_csum(inode, io_bio, i, bvec->bv_page, - bvec->bv_offset, done->start, - bvec->bv_len); - if (!ret) - clean_io_failure(BTRFS_I(inode)->root->fs_info, - failure_tree, io_tree, done->start, - bvec->bv_page, - btrfs_ino(BTRFS_I(inode)), - bvec->bv_offset); - else - uptodate = 0; - i++; - } - - done->uptodate = uptodate; -end: - complete(&done->done); - bio_put(bio); -} - -static blk_status_t __btrfs_subio_endio_read(struct inode *inode, - struct btrfs_io_bio *io_bio, blk_status_t err) -{ - struct btrfs_fs_info *fs_info; - struct bio_vec bvec; - struct bvec_iter iter; - struct btrfs_retry_complete done; - u64 start; - u64 offset = 0; - u32 sectorsize; - int nr_sectors; - unsigned int pgoff; - int csum_pos; - bool uptodate = (err == 0); - int ret; - blk_status_t status; - - fs_info = BTRFS_I(inode)->root->fs_info; - sectorsize = fs_info->sectorsize; - - err = BLK_STS_OK; - start = io_bio->logical; - done.inode = inode; - io_bio->bio.bi_iter = io_bio->iter; - - bio_for_each_segment(bvec, &io_bio->bio, iter) { - nr_sectors = BTRFS_BYTES_TO_BLKS(fs_info, bvec.bv_len); - - pgoff = bvec.bv_offset; -next_block: - if (uptodate) { - csum_pos = BTRFS_BYTES_TO_BLKS(fs_info, offset); - ret = check_data_csum(inode, io_bio, csum_pos, - bvec.bv_page, pgoff, start, - sectorsize); - if (likely(!ret)) - goto next; - } -try_again: - done.uptodate = 0; - done.start = start; - init_completion(&done.done); - - status = dio_read_error(inode, &io_bio->bio, bvec.bv_page, - pgoff, start, start + sectorsize - 1, - io_bio->mirror_num, btrfs_retry_endio, - &done); - if (status) { - err = status; - goto next; - } - - wait_for_completion_io(&done.done); - - if (!done.uptodate) { - /* We might have another mirror, so try again */ - goto try_again; - } -next: - offset += sectorsize; - start += sectorsize; - - ASSERT(nr_sectors); - - nr_sectors--; - if (nr_sectors) { + if (uptodate && + (!csum || !check_data_csum(inode, io_bio, icsum, + bvec.bv_page, pgoff, + start, sectorsize))) { + clean_io_failure(fs_info, failure_tree, io_tree, + start, bvec.bv_page, + btrfs_ino(BTRFS_I(inode)), + pgoff); + } else { + blk_status_t status; + + status = dio_read_error(inode, &io_bio->bio, + bvec.bv_page, pgoff, + start, + start + sectorsize - 1, + io_bio->mirror_num); + if (status) + err = status; + } + start += sectorsize; + icsum++; pgoff += sectorsize; - ASSERT(pgoff < PAGE_SIZE); - goto next_block; } } - return err; } -static blk_status_t btrfs_check_read_dio_bio(struct inode *inode, - struct btrfs_io_bio *io_bio, - blk_status_t err) -{ - bool skip_csum = BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM; - - if (skip_csum) { - if (unlikely(err)) - return __btrfs_correct_data_nocsum(inode, io_bio); - else - return BLK_STS_OK; - } else { - return __btrfs_subio_endio_read(inode, io_bio, err); - } -} - static void __endio_write_update_ordered(struct inode *inode, const u64 offset, const u64 bytes, const bool uptodate) @@ -7785,7 +7601,7 @@ static void btrfs_end_dio_bio(struct bio *bio) if (bio_op(bio) == REQ_OP_READ) { err = btrfs_check_read_dio_bio(dip->inode, btrfs_io_bio(bio), - err); + !err); } if (err) From patchwork Thu Apr 16 21:46:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Omar Sandoval X-Patchwork-Id: 11493821 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C6A546CA for ; Thu, 16 Apr 2020 21:46:55 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id AECB5221F9 for ; Thu, 16 Apr 2020 21:46:55 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=osandov-com.20150623.gappssmtp.com header.i=@osandov-com.20150623.gappssmtp.com header.b="J5kms3vT" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728584AbgDPVqz (ORCPT ); Thu, 16 Apr 2020 17:46:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48928 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1728528AbgDPVqu (ORCPT ); Thu, 16 Apr 2020 17:46:50 -0400 Received: from mail-pl1-x643.google.com (mail-pl1-x643.google.com [IPv6:2607:f8b0:4864:20::643]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5F97EC061A0C for ; Thu, 16 Apr 2020 14:46:50 -0700 (PDT) Received: by mail-pl1-x643.google.com with SMTP id w3so127150plz.5 for ; Thu, 16 Apr 2020 14:46:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=osandov-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=v/fPOjwOAoYka/OhWwVPsC5QhTY/u8lwlVc95DvkDNc=; b=J5kms3vTWTNFXnLWB+t72U5h0VHZH5C2jHgArcFlzB/v3LlqdG25R4kJ5CBO80IyoT bYUQa37S1vdGmWErDxAa/GvvM5W02vfNtuixM+dNdaieHThnpd2nSzymmqm/nQXLxK3a POJkciB+XyNwaohpyTs+dgwHKLw5e+e0aNMd0S8Gl6VRiUzk+ZKQB/5FiLcmtOnXKmOr /vFblMmFV5YXICwkd165WC6C3PwrwJeJXxO5odKZU8UmgvAsnNZL/16Exvm/hHXE7+PW +mFi29lmZjvoPeSozMKylLtSd4aZjejd5sJeApjF4Y9UVFfSrErlFONpDRTRfnN+lfLx CPlA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=v/fPOjwOAoYka/OhWwVPsC5QhTY/u8lwlVc95DvkDNc=; b=bDsAK3hdHVRg48UrGaOj3VfPD3CHMydwg4aG2hIlBR1ZPYtZfMjRpL059JrZJA9Xaj 7ppHVrGBNSkpTtCe6bGUnN2Mk6ecMpwdjYWA5fEaLAsmm0KxYKx2QeqM68CHhLi5daJN Zss72/BJMPb4ARQdszDcHK54CAvxrAWmxTrswXtD7zURa7VFfk4OpIziIkMyQP/314EZ uxEeqrjaU4tPM0yro97a8KDTvDqmDE+PGvpdWeet28o6SeGzRlARPRIC0iaRFgFXfW0z qo0XK0UTVoKbWt7T+h0KCe0eNP05wQDY54nO2LD4AIocL1amHpEaCz1Ukca/Jixl2lg5 ZP3Q== X-Gm-Message-State: AGi0PuZmFidijecmTBxp8PfbqICVB56mjBvAUAJRclKPo8thr7sipei9 1uvs6BJoKUN/h7bdbxrgOCceC5W6dDA= X-Google-Smtp-Source: APiQypIz4Mf40KrkcRVPtXq2pq7lXraOVtS7Wi01khwI8GD1zffytFJqZI/Eem2QUfGiML3lar92+Q== X-Received: by 2002:a17:90a:65c5:: with SMTP id i5mr491426pjs.18.1587073609523; Thu, 16 Apr 2020 14:46:49 -0700 (PDT) Received: from vader.tfbnw.net ([2620:10d:c090:400::5:844e]) by smtp.gmail.com with ESMTPSA id 17sm12440228pgg.76.2020.04.16.14.46.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 16 Apr 2020 14:46:48 -0700 (PDT) From: Omar Sandoval To: linux-btrfs@vger.kernel.org Cc: kernel-team@fb.com, Jens Axboe , Christoph Hellwig Subject: [PATCH v2 14/15] btrfs: get rid of endio_repair_workers Date: Thu, 16 Apr 2020 14:46:24 -0700 Message-Id: X-Mailer: git-send-email 2.26.1 In-Reply-To: References: MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Omar Sandoval This was originally added in commit 8b110e393c5a ("Btrfs: implement repair function when direct read fails") to avoid a deadlock. In that commit, the direct I/O read endio executes on the endio_workers workqueue, submits a repair bio, and waits for it to complete. The repair bio endio must execute on a different workqueue, otherwise it could block on the endio_workers workqueue becoming available, which won't happen because the original endio is blocked on the repair bio. As of the previous commit, the original endio doesn't wait for the repair bio, so this separate workqueue is unnecessary. Reviewed-by: Josef Bacik Signed-off-by: Omar Sandoval --- fs/btrfs/ctree.h | 1 - fs/btrfs/disk-io.c | 8 +------- fs/btrfs/disk-io.h | 1 - fs/btrfs/inode.c | 2 +- 4 files changed, 2 insertions(+), 10 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index c322568231a4..91b9052f315e 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -758,7 +758,6 @@ struct btrfs_fs_info { struct btrfs_workqueue *endio_workers; struct btrfs_workqueue *endio_meta_workers; struct btrfs_workqueue *endio_raid56_workers; - struct btrfs_workqueue *endio_repair_workers; struct btrfs_workqueue *rmw_workers; struct btrfs_workqueue *endio_meta_write_workers; struct btrfs_workqueue *endio_write_workers; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index a6cb5cbbdb9f..22efd6defcf7 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -709,9 +709,7 @@ static void end_workqueue_bio(struct bio *bio) else wq = fs_info->endio_write_workers; } else { - if (unlikely(end_io_wq->metadata == BTRFS_WQ_ENDIO_DIO_REPAIR)) - wq = fs_info->endio_repair_workers; - else if (end_io_wq->metadata == BTRFS_WQ_ENDIO_RAID56) + if (end_io_wq->metadata == BTRFS_WQ_ENDIO_RAID56) wq = fs_info->endio_raid56_workers; else if (end_io_wq->metadata) wq = fs_info->endio_meta_workers; @@ -1942,7 +1940,6 @@ static void btrfs_stop_all_workers(struct btrfs_fs_info *fs_info) btrfs_destroy_workqueue(fs_info->workers); btrfs_destroy_workqueue(fs_info->endio_workers); btrfs_destroy_workqueue(fs_info->endio_raid56_workers); - btrfs_destroy_workqueue(fs_info->endio_repair_workers); btrfs_destroy_workqueue(fs_info->rmw_workers); btrfs_destroy_workqueue(fs_info->endio_write_workers); btrfs_destroy_workqueue(fs_info->endio_freespace_worker); @@ -2148,8 +2145,6 @@ static int btrfs_init_workqueues(struct btrfs_fs_info *fs_info, fs_info->endio_raid56_workers = btrfs_alloc_workqueue(fs_info, "endio-raid56", flags, max_active, 4); - fs_info->endio_repair_workers = - btrfs_alloc_workqueue(fs_info, "endio-repair", flags, 1, 0); fs_info->rmw_workers = btrfs_alloc_workqueue(fs_info, "rmw", flags, max_active, 2); fs_info->endio_write_workers = @@ -2173,7 +2168,6 @@ static int btrfs_init_workqueues(struct btrfs_fs_info *fs_info, fs_info->flush_workers && fs_info->endio_workers && fs_info->endio_meta_workers && fs_info->endio_meta_write_workers && - fs_info->endio_repair_workers && fs_info->endio_write_workers && fs_info->endio_raid56_workers && fs_info->endio_freespace_worker && fs_info->rmw_workers && fs_info->caching_workers && fs_info->readahead_workers && diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h index cd629113f61c..734bc5270b6a 100644 --- a/fs/btrfs/disk-io.h +++ b/fs/btrfs/disk-io.h @@ -25,7 +25,6 @@ enum btrfs_wq_endio_type { BTRFS_WQ_ENDIO_METADATA, BTRFS_WQ_ENDIO_FREE_SPACE, BTRFS_WQ_ENDIO_RAID56, - BTRFS_WQ_ENDIO_DIO_REPAIR, }; static inline u64 btrfs_sb_offset(int mirror) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 2580f2d251d4..72fb398a88f7 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -7388,7 +7388,7 @@ static inline blk_status_t submit_dio_repair_bio(struct inode *inode, BUG_ON(bio_op(bio) == REQ_OP_WRITE); - ret = btrfs_bio_wq_end_io(fs_info, bio, BTRFS_WQ_ENDIO_DIO_REPAIR); + ret = btrfs_bio_wq_end_io(fs_info, bio, BTRFS_WQ_ENDIO_DATA); if (ret) return ret; From patchwork Thu Apr 16 21:46:25 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Omar Sandoval X-Patchwork-Id: 11493817 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B822B14DD for ; Thu, 16 Apr 2020 21:46:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 96AE822242 for ; Thu, 16 Apr 2020 21:46:53 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=osandov-com.20150623.gappssmtp.com header.i=@osandov-com.20150623.gappssmtp.com header.b="pu2ItKPM" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728568AbgDPVqx (ORCPT ); Thu, 16 Apr 2020 17:46:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48934 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1728526AbgDPVqw (ORCPT ); Thu, 16 Apr 2020 17:46:52 -0400 Received: from mail-pl1-x642.google.com (mail-pl1-x642.google.com [IPv6:2607:f8b0:4864:20::642]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D5852C061A0C for ; Thu, 16 Apr 2020 14:46:51 -0700 (PDT) Received: by mail-pl1-x642.google.com with SMTP id y22so129046pll.4 for ; Thu, 16 Apr 2020 14:46:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=osandov-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=LcoO5JXWi6U7hEw7U4/GAhMEvthRp0rDTrpYyehIiys=; b=pu2ItKPMmCAPQpe8ZDw1JYhmPmvKSabf9m4nwrmqkBBQ0DwiNIisUFrj5MIkyPHcPJ 54f3gQmvVV3uXJECbslaJ1Q5OAej2RqbBvjatgVSj1EPeEwrtFsy5cfb0ZYIvMmaNMl7 va/IPLjTFV/8RfOMwei1epbDK+4r/QFcX0VfMLdoKrw+s2sgKbjaBdw4ZesbrK8q21Y4 caU+JOK9rYsrUm5DF8CKjx4dWt6EFVgYfjJizekEKKh4ZvlQCdHZ5U3fE6RbpFZMOGOj smZDdZbu1urbBbjtb1XUzftcAaPbZkGU084O/gcdwlVteFwqGeOFKjTJeoMsPgMvXFj6 GWmQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=LcoO5JXWi6U7hEw7U4/GAhMEvthRp0rDTrpYyehIiys=; b=rzFlSdQMRmJLuxMFJsA6uBWevJTfHqW0PZQNIM94coJo/U4B0SnD0PDXB8BKHeWZVK BjeocrtjH2hD+7tDf43K8kacTEsJ35LddXcf95DUthvLBkN4k8TYefJyFr0ObvQSBBFa s0IHkpi3NDZGgmZ+sZGhkv4V6jpF0QWTIl1+THdrpYDM+7nuWQIFS+OYNCRVp95MRhJc yxtz7hLewR3CZrHDk4xcqIMomMn0l3Wptx/F7xUYLGfFoa8cgOlF2EuGyXdA5Fc3nxe0 KfH8d9qUyWrnV+r5D3+HZewdQ7zEGKnZhitqwL3DXi/pEYRN4qR3zABuz2yJrw2CmGea Cr8A== X-Gm-Message-State: AGi0PuYP1TNfvsLlTSUFbeoqO6bi8SFRdrG+SPPX0jpvVlDdjqgwT8YP m1bxhi4uKTV0MUQPAXadTuTxjasDQ6I= X-Google-Smtp-Source: APiQypJJscN43vnYJiWVdA2zTQ7JNzb/ySyzE6gWbOuM90fP/Up3eYhefm0SBcrYDVDKcn+Ak+MD8w== X-Received: by 2002:a17:90a:7d09:: with SMTP id g9mr427034pjl.105.1587073610903; Thu, 16 Apr 2020 14:46:50 -0700 (PDT) Received: from vader.tfbnw.net ([2620:10d:c090:400::5:844e]) by smtp.gmail.com with ESMTPSA id 17sm12440228pgg.76.2020.04.16.14.46.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 16 Apr 2020 14:46:50 -0700 (PDT) From: Omar Sandoval To: linux-btrfs@vger.kernel.org Cc: kernel-team@fb.com, Jens Axboe , Christoph Hellwig Subject: [PATCH v2 15/15] btrfs: unify buffered and direct I/O read repair Date: Thu, 16 Apr 2020 14:46:25 -0700 Message-Id: <65f462557c05818d83fe8e141b24f143e2af347e.1587072977.git.osandov@fb.com> X-Mailer: git-send-email 2.26.1 In-Reply-To: References: MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Omar Sandoval Currently, direct I/O has its own versions of bio_readpage_error() and btrfs_check_repairable() (dio_read_error() and btrfs_check_dio_repairable(), respectively). The main difference is that the direct I/O version doesn't do read validation. The rework of direct I/O repair makes it possible to do validation, so we can get rid of btrfs_check_dio_repairable() and combine bio_readpage_error() and dio_read_error() into a new helper, btrfs_submit_read_repair(). Reviewed-by: Josef Bacik Signed-off-by: Omar Sandoval --- fs/btrfs/extent_io.c | 130 ++++++++++++++++++++----------------------- fs/btrfs/extent_io.h | 17 ++++-- fs/btrfs/inode.c | 103 ++++------------------------------ 3 files changed, 82 insertions(+), 168 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 6e1d97bb7652..4c59ff5b0e3c 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2602,41 +2602,6 @@ static bool btrfs_check_repairable(struct inode *inode, return true; } - -struct bio *btrfs_create_repair_bio(struct inode *inode, struct bio *failed_bio, - struct io_failure_record *failrec, - struct page *page, int pg_offset, int icsum, - bio_end_io_t *endio_func, void *data) -{ - struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); - struct bio *bio; - struct btrfs_io_bio *btrfs_failed_bio; - struct btrfs_io_bio *btrfs_bio; - - bio = btrfs_io_bio_alloc(1); - bio->bi_end_io = endio_func; - bio->bi_iter.bi_sector = failrec->logical >> 9; - bio->bi_iter.bi_size = 0; - bio->bi_private = data; - - btrfs_failed_bio = btrfs_io_bio(failed_bio); - if (btrfs_failed_bio->csum) { - u16 csum_size = btrfs_super_csum_size(fs_info->super_copy); - - btrfs_bio = btrfs_io_bio(bio); - btrfs_bio->csum = btrfs_bio->csum_inline; - icsum *= csum_size; - memcpy(btrfs_bio->csum, btrfs_failed_bio->csum + icsum, - csum_size); - } - - bio_add_page(bio, page, failrec->len, pg_offset); - btrfs_io_bio(bio)->logical = failrec->start; - btrfs_io_bio(bio)->iter = bio->bi_iter; - - return bio; -} - static bool btrfs_io_needs_validation(struct inode *inode, struct bio *bio) { struct bio_vec *bvec; @@ -2654,72 +2619,96 @@ static bool btrfs_io_needs_validation(struct inode *inode, struct bio *bio) /* * We need to validate each sector individually if the failed I/O was * for multiple sectors. + * + * There are a few possible bios that can end up here: + * 1. A buffered read bio, which is not cloned. + * 2. A direct I/O read bio, which is cloned. + * 3. A (buffered or direct) repair bio, which is not cloned. + * + * For cloned bios (case 2), we can get the size from + * btrfs_io_bio->iter; for non-cloned bios (cases 1 and 3), we can get + * it from the bvecs. */ - bio_for_each_bvec_all(bvec, bio, i) { - len += bvec->bv_len; - if (len > inode->i_sb->s_blocksize) + if (bio_flagged(bio, BIO_CLONED)) { + if (btrfs_io_bio(bio)->iter.bi_size > inode->i_sb->s_blocksize) return true; + } else { + bio_for_each_bvec_all(bvec, bio, i) { + len += bvec->bv_len; + if (len > inode->i_sb->s_blocksize) + return true; + } } return false; } -/* - * This is a generic handler for readpage errors. If other copies exist, read - * those and write back good data to the failed position. Does not investigate - * in remapping the failed extent elsewhere, hoping the device will be smart - * enough to do this as needed - */ -static int bio_readpage_error(struct bio *failed_bio, u64 phy_offset, - struct page *page, u64 start, u64 end, - int failed_mirror) +blk_status_t btrfs_submit_read_repair(struct inode *inode, + struct bio *failed_bio, u64 phy_offset, + struct page *page, unsigned int pgoff, + u64 start, u64 end, int failed_mirror, + submit_bio_hook_t *submit_bio_hook) { struct io_failure_record *failrec; - struct inode *inode = page->mapping->host; + struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); struct extent_io_tree *tree = &BTRFS_I(inode)->io_tree; struct extent_io_tree *failure_tree = &BTRFS_I(inode)->io_failure_tree; + struct btrfs_io_bio *failed_io_bio = btrfs_io_bio(failed_bio); + int icsum = phy_offset >> inode->i_sb->s_blocksize_bits; bool need_validation; - struct bio *bio; - int read_mode = 0; + struct bio *repair_bio; + struct btrfs_io_bio *repair_io_bio; blk_status_t status; int ret; + btrfs_info(btrfs_sb(inode->i_sb), + "Repair Read Error: read error at %llu", start); + BUG_ON(bio_op(failed_bio) == REQ_OP_WRITE); ret = btrfs_get_io_failure_record(inode, start, end, &failrec); if (ret) - return ret; + return errno_to_blk_status(ret); need_validation = btrfs_io_needs_validation(inode, failed_bio); if (!btrfs_check_repairable(inode, need_validation, failrec, failed_mirror)) { free_io_failure(failure_tree, tree, failrec); - return -EIO; + return BLK_STS_IOERR; } + repair_bio = btrfs_io_bio_alloc(1); + repair_io_bio = btrfs_io_bio(repair_bio); + repair_bio->bi_opf = REQ_OP_READ; if (need_validation) - read_mode |= REQ_FAILFAST_DEV; + repair_bio->bi_opf |= REQ_FAILFAST_DEV; + repair_bio->bi_end_io = failed_bio->bi_end_io; + repair_bio->bi_iter.bi_sector = failrec->logical >> 9; + repair_bio->bi_private = failed_bio->bi_private; - phy_offset >>= inode->i_sb->s_blocksize_bits; - bio = btrfs_create_repair_bio(inode, failed_bio, failrec, page, - start - page_offset(page), - (int)phy_offset, failed_bio->bi_end_io, - NULL); - bio->bi_opf = REQ_OP_READ | read_mode; + if (failed_io_bio->csum) { + u16 csum_size = btrfs_super_csum_size(fs_info->super_copy); + + repair_io_bio->csum = repair_io_bio->csum_inline; + memcpy(repair_io_bio->csum, + failed_io_bio->csum + csum_size * icsum, csum_size); + } + + bio_add_page(repair_bio, page, failrec->len, pgoff); + repair_io_bio->logical = failrec->start; + repair_io_bio->iter = repair_bio->bi_iter; btrfs_debug(btrfs_sb(inode->i_sb), - "Repair Read Error: submitting new read[%#x] to this_mirror=%d, in_validation=%d", - read_mode, failrec->this_mirror, failrec->in_validation); +"Repair Read Error: submitting new read to this_mirror=%d, in_validation=%d", + failrec->this_mirror, failrec->in_validation); - status = tree->ops->submit_bio_hook(tree->private_data, bio, failrec->this_mirror, - failrec->bio_flags); + status = submit_bio_hook(inode, repair_bio, failrec->this_mirror, + failrec->bio_flags); if (status) { free_io_failure(failure_tree, tree, failrec); - bio_put(bio); - ret = blk_status_to_errno(status); + bio_put(repair_bio); } - - return ret; + return status; } /* lots and lots of room for performance fixes in the end_bio funcs */ @@ -2891,9 +2880,10 @@ static void end_bio_extent_readpage(struct bio *bio) * If it can't handle the error it will return -EIO and * we remain responsible for that page. */ - ret = bio_readpage_error(bio, offset, page, start, end, - mirror); - if (ret == 0) { + if (!btrfs_submit_read_repair(inode, bio, offset, page, + start - page_offset(page), + start, end, mirror, + tree->ops->submit_bio_hook)) { uptodate = !bio->bi_status; offset += len; continue; diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index f4dfac756455..a2842b2d9a98 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -66,6 +66,10 @@ struct btrfs_io_bio; struct io_failure_record; struct extent_io_tree; +typedef blk_status_t (submit_bio_hook_t)(struct inode *inode, struct bio *bio, + int mirror_num, + unsigned long bio_flags); + typedef blk_status_t (extent_submit_bio_start_t)(void *private_data, struct bio *bio, u64 bio_offset); @@ -74,8 +78,7 @@ struct extent_io_ops { * The following callbacks must be always defined, the function * pointer will be called unconditionally. */ - blk_status_t (*submit_bio_hook)(struct inode *inode, struct bio *bio, - int mirror_num, unsigned long bio_flags); + submit_bio_hook_t *submit_bio_hook; int (*readpage_end_io_hook)(struct btrfs_io_bio *io_bio, u64 phy_offset, struct page *page, u64 start, u64 end, int mirror); @@ -312,10 +315,12 @@ struct io_failure_record { }; -struct bio *btrfs_create_repair_bio(struct inode *inode, struct bio *failed_bio, - struct io_failure_record *failrec, - struct page *page, int pg_offset, int icsum, - bio_end_io_t *endio_func, void *data); +blk_status_t btrfs_submit_read_repair(struct inode *inode, + struct bio *failed_bio, u64 phy_offset, + struct page *page, unsigned int pgoff, + u64 start, u64 end, int failed_mirror, + submit_bio_hook_t *submit_bio_hook); + #ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS bool find_lock_delalloc_range(struct inode *inode, struct page *locked_page, u64 *start, diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 72fb398a88f7..a4e9f9d0a43d 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -7379,10 +7379,11 @@ static void btrfs_dio_private_put(struct btrfs_dio_private *dip) kfree(dip); } -static inline blk_status_t submit_dio_repair_bio(struct inode *inode, - struct bio *bio, - int mirror_num) +static blk_status_t submit_dio_repair_bio(struct inode *inode, struct bio *bio, + int mirror_num, + unsigned long bio_flags) { + struct btrfs_dio_private *dip = bio->bi_private; struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); blk_status_t ret; @@ -7392,96 +7393,11 @@ static inline blk_status_t submit_dio_repair_bio(struct inode *inode, if (ret) return ret; + refcount_inc(&dip->refs); ret = btrfs_map_bio(fs_info, bio, mirror_num); - - return ret; -} - -static int btrfs_check_dio_repairable(struct inode *inode, - struct bio *failed_bio, - struct io_failure_record *failrec, - int failed_mirror) -{ - struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); - int num_copies; - - num_copies = btrfs_num_copies(fs_info, failrec->logical, failrec->len); - if (num_copies == 1) { - /* - * we only have a single copy of the data, so don't bother with - * all the retry and error correction code that follows. no - * matter what the error is, it is very likely to persist. - */ - btrfs_debug(fs_info, - "Check DIO Repairable: cannot repair, num_copies=%d, next_mirror %d, failed_mirror %d", - num_copies, failrec->this_mirror, failed_mirror); - return 0; - } - - failrec->failed_mirror = failed_mirror; - failrec->this_mirror++; - if (failrec->this_mirror == failed_mirror) - failrec->this_mirror++; - - if (failrec->this_mirror > num_copies) { - btrfs_debug(fs_info, - "Check DIO Repairable: (fail) num_copies=%d, next_mirror %d, failed_mirror %d", - num_copies, failrec->this_mirror, failed_mirror); - return 0; - } - - return 1; -} - -static blk_status_t dio_read_error(struct inode *inode, struct bio *failed_bio, - struct page *page, unsigned int pgoff, - u64 start, u64 end, int failed_mirror) -{ - struct btrfs_dio_private *dip = failed_bio->bi_private; - struct io_failure_record *failrec; - struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree; - struct extent_io_tree *failure_tree = &BTRFS_I(inode)->io_failure_tree; - struct bio *bio; - int isector; - unsigned int read_mode = 0; - int ret; - blk_status_t status; - - BUG_ON(bio_op(failed_bio) == REQ_OP_WRITE); - - ret = btrfs_get_io_failure_record(inode, start, end, &failrec); if (ret) - return errno_to_blk_status(ret); - - ret = btrfs_check_dio_repairable(inode, failed_bio, failrec, - failed_mirror); - if (!ret) { - free_io_failure(failure_tree, io_tree, failrec); - return BLK_STS_IOERR; - } - - if (btrfs_io_bio(failed_bio)->iter.bi_size > inode->i_sb->s_blocksize) - read_mode |= REQ_FAILFAST_DEV; - - isector = start - btrfs_io_bio(failed_bio)->logical; - isector >>= inode->i_sb->s_blocksize_bits; - bio = btrfs_create_repair_bio(inode, failed_bio, failrec, page, pgoff, - isector, failed_bio->bi_end_io, dip); - bio->bi_opf = REQ_OP_READ | read_mode; - - btrfs_debug(BTRFS_I(inode)->root->fs_info, - "repair DIO read error: submitting new dio read[%#x] to this_mirror=%d, in_validation=%d", - read_mode, failrec->this_mirror, failrec->in_validation); - - refcount_inc(&dip->refs); - status = submit_dio_repair_bio(inode, bio, failrec->this_mirror); - if (status) { - free_io_failure(failure_tree, io_tree, failrec); - bio_put(bio); refcount_dec(&dip->refs); - } - - return status; + return ret; } static blk_status_t btrfs_check_read_dio_bio(struct inode *inode, @@ -7517,11 +7433,14 @@ static blk_status_t btrfs_check_read_dio_bio(struct inode *inode, } else { blk_status_t status; - status = dio_read_error(inode, &io_bio->bio, + status = btrfs_submit_read_repair(inode, + &io_bio->bio, + start - io_bio->logical, bvec.bv_page, pgoff, start, start + sectorsize - 1, - io_bio->mirror_num); + io_bio->mirror_num, + submit_dio_repair_bio); if (status) err = status; }