[v2] btrfs: scrub: avoid unnecessary extent tree search for simple stripes

[BUG]
When scrubing an empty fs with RAID0, we will call scrub_simple_mirror()
again and again on ranges which has no extent at all.

This is especially obvious if we have both RAID0 and SINGLE.

 # mkfs.btrfs -f -m single -d raid0 $dev
 # mount $dev $mnt
 # xfs_io -f -c "pwrite 0 4k" $mnt/file
 # sync
 # btrfs scrub start -B $mnt

With extra call trace on scrub_simple_mirror(), we got the following
trace:

  256.028473: scrub_simple_mirror: logical=1048576 len=4194304 bg=1048576 bg_len=4194304
  256.028930: scrub_simple_mirror: logical=5242880 len=8388608 bg=5242880 bg_len=8388608
  256.029891: scrub_simple_mirror: logical=22020096 len=65536 bg=22020096 bg_len=1073741824
  256.029892: scrub_simple_mirror: logical=22085632 len=65536 bg=22020096 bg_len=1073741824
  256.029893: scrub_simple_mirror: logical=22151168 len=65536 bg=22020096 bg_len=1073741824
  ... 16K lines skipped ...
  256.048777: scrub_simple_mirror: logical=1095630848 len=65536 bg=22020096 bg_len=1073741824
  256.048778: scrub_simple_mirror: logical=1095696384 len=65536 bg=22020096 bg_len=1073741824

The first two lines shows we just call scrub_simple_mirror() for the
metadata and system chunks once.

But later 16K lines are all scrub_simple_mirror() for the almost empty
RAID0 data block group.

Most of the calls would exit very quickly since there is no extent in
that data chunk.

[CAUSE]
For RAID0/RAID10 we go scrub_simple_stripe() to handle the scrub for the
block group. And since inside each stripe it's just plain SINGLE/RAID1,
thus we reuse scrub_simple_mirror().

But there is a pitfall, that inside scrub_simple_mirror() we will do at
least one extent tree search to find the extent in the range.

Just like above case, we can have a huge gap which has no extent in them
at all.
In that case, we will do extent tree search again and again, even we
already know there is no more extent in the block group.

[FIX]
To fix the super inefficient extent tree search, we introduce
@found_next parameter for the following functions:

- find_first_extent_item()
- scrub_simple_mirror()

If the function find_first_extent_item() returns 1 and @found_next
pointer is provided, it will store the bytenr of the bytenr of the next
extent (if at the end of the extent tree, U64_MAX is used).

So for scrub_simple_stripe(), after scrubing the current stripe and
increased the logical bytenr, we check if our next range reaches
@found_next.

If not, increase our @cur_logical by our increment until we reached
@found_next.

By this, even for an almost empty RAID0 block group, we just execute
"cur_logical += logical_increment;" 16K times, not doing tree search 16K
times.

With the optimization, the same trace looks like this now:

  1283.376212: scrub_simple_mirror: logical=1048576 len=4194304 bg=1048576 bg_len=4194304
  1283.376754: scrub_simple_mirror: logical=5242880 len=8388608 bg=5242880 bg_len=8388608
  1283.377623: scrub_simple_mirror: logical=22020096 len=65536 bg=22020096 bg_len=1073741824
  1283.377625: scrub_simple_mirror: logical=67108864 len=65536 bg=22020096 bg_len=1073741824
  1283.377627: scrub_simple_mirror: logical=67174400 len=65536 bg=22020096 bg_len=1073741824

Note the scrub at logical 67108864, that's because the 4K write only
lands there, not at the beginning of the data chunk (due to super block
reserved space split the 1G chunk into two parts).

And the time duration of the chunk 22020096 is much shorter
(18887us vs 4us).

Unfortunately this optimization only works for RAID0/RAID10 with big
holes in the block group.

For real world cases it's much harder to find huge gaps (although we can
still skip several stripes).
And even for the huge gap cases, the optimization itself is hardly
observable (less than 1 second even for an almost empty 10G block group).

And also unfortunately for RAID5 data stripes, we can not go the similar
optimization for RAID0/RAID10 due to the extra rotation.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
Changelog:
v2:
- Rebased to latest misc-next
  There are some minor conflicts related to map->stripe_len.

- Update the comments for find_first_extent_item()
  Mostly for the new parameter @found_next.
---
 fs/btrfs/scrub.c | 47 +++++++++++++++++++++++++++++++++++++----------
 1 file changed, 37 insertions(+), 10 deletions(-)

Message ID	8d8e77d19cb56fc954353a659b5382ecf0c4a0d6.1677723997.git.wqu@suse.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-btrfs-owner@vger.kernel.org> X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9C758C678D4 for <linux-btrfs@archiver.kernel.org>; Thu, 2 Mar 2023 02:45:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229567AbjCBCpX (ORCPT <rfc822;linux-btrfs@archiver.kernel.org>); Wed, 1 Mar 2023 21:45:23 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38698 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229530AbjCBCpW (ORCPT <rfc822;linux-btrfs@vger.kernel.org>); Wed, 1 Mar 2023 21:45:22 -0500 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A7F1E1B2E5 for <linux-btrfs@vger.kernel.org>; Wed, 1 Mar 2023 18:45:20 -0800 (PST) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 2D8391FE5F for <linux-btrfs@vger.kernel.org>; Thu, 2 Mar 2023 02:45:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1677725119; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=/VfDTGGxkdNa2RlDN5Zo+YhV4V1yHkiY5uPgXRwIA4Y=; b=uyXufhFOE2To3aJUeoZrXsi0itM0Y5YbbOa8j6GGe/KlRmtC6t0S5SaNuH7fOfWLNC2UOC IByRP108+GmPoC6euXRZQNkdq83Xyr+k9GKw7dbxwludE5r8pt9wNZqBkEYjiWxSKTrAwu YqgnpM9aY62TcaoLGylYgmExhBk4CXo= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 7B73013A5C for <linux-btrfs@vger.kernel.org>; Thu, 2 Mar 2023 02:45:18 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id 8YuxEL4NAGSwDwAAMHmgww (envelope-from <wqu@suse.com>) for <linux-btrfs@vger.kernel.org>; Thu, 02 Mar 2023 02:45:18 +0000 From: Qu Wenruo <wqu@suse.com> To: linux-btrfs@vger.kernel.org Subject: [PATCH v2] btrfs: scrub: avoid unnecessary extent tree search for simple stripes Date: Thu, 2 Mar 2023 10:45:00 +0800 Message-Id: <8d8e77d19cb56fc954353a659b5382ecf0c4a0d6.1677723997.git.wqu@suse.com> X-Mailer: git-send-email 2.39.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: <linux-btrfs.vger.kernel.org> X-Mailing-List: linux-btrfs@vger.kernel.org
Series	[v2] btrfs: scrub: avoid unnecessary extent tree search for simple stripes \| expand [v2] btrfs: scrub: avoid unnecessary extent tree search for simple stripes

[v2] btrfs: scrub: avoid unnecessary extent tree search for simple stripes

Commit Message

Patch