From patchwork Tue Mar 1 09:33:33 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Patrick Steinhardt X-Patchwork-Id: 12764397 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4B658C433EF for ; Tue, 1 Mar 2022 09:34:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232427AbiCAJel (ORCPT ); Tue, 1 Mar 2022 04:34:41 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52686 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230116AbiCAJej (ORCPT ); Tue, 1 Mar 2022 04:34:39 -0500 Received: from out2-smtp.messagingengine.com (out2-smtp.messagingengine.com [66.111.4.26]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6C9725E762 for ; Tue, 1 Mar 2022 01:33:44 -0800 (PST) Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.nyi.internal (Postfix) with ESMTP id 82B535C0190; Tue, 1 Mar 2022 04:33:38 -0500 (EST) Received: from mailfrontend1 ([10.202.2.162]) by compute4.internal (MEProxy); Tue, 01 Mar 2022 04:33:38 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pks.im; h=cc:cc :content-type:date:date:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:sender:subject :subject:to:to; s=fm3; bh=vslqMORETYS6jZFtVovXeqJAYt9FVSZguAjzMH Qo+RI=; b=rDaby9OAFrq6IIiXNFW0sxnhtlxAAOYrKI4yHO/CsruRzYGHOc4GzG aovEN3vFTFYgv9Xvi3HgkXTm5OrLdJjRrmiW9xVl8I3v7MP4216lcCuYvGad95CL E8hll7Rl8tPc2xIG94e27C6Gze95TVPm0UXkeZSW5J6H14pZtKAzjF3+CM9dAPJH VR3YP7TuLQhksJyESPQXb2JfgsIpwuKNvZvoJS7kJ7CV9JYZcoGLdDJCtn3cP2Db 3LmZXBpvDpCRkDkFsOcG6KfI8RtAldrEy7EUsvGhYM5UeO+HHqPsul096rURCYWA FnBaxZF7raCTUWsglrxg0N8mCEApcM+w== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to:x-me-proxy:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm2; bh=vslqMORETYS6jZFtV ovXeqJAYt9FVSZguAjzMHQo+RI=; b=ceX89niOHPUgdNA5/Ov48B5hWA53N0Lvg tianleiBwxcBTdMzcRtJ+Pw0WCCMccsRxvDwHpJh4pP5Sxgg5FMp94r2tmfwH01J kAQMO7Fjdb198BIA0YPEO22JjCZU1HS5Jl9SOlx+OFZQ4HHl8b4qzyeSdl0ehJoO nK9jtve7mUQucX1zIp8O6W4zcUZoX7VTE0ka0tZK178cJQtqhy+pqXCIPqhTYPfv /cGehu6bQoCIC1Iph6CsgWW+FneFOCfKxZl1X2wDvbfr18GwprkWm69C2hmWI4S6 AM6V3rWLzlB7zavNFqzSEpDuLELZvq28Fy81YT6sqtSTIIQROrPzw== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvvddruddtvddgtdefucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepfffhvffukfhfgggtuggjsehgtderredttdejnecuhfhrohhmpefrrghtrhhi tghkucfuthgvihhnhhgrrhguthcuoehpshesphhkshdrihhmqeenucggtffrrghtthgvrh hnpeehgfejueevjeetudehgffffeffvdejfeejiedvkeffgfekuefgheevteeufeelkeen ucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpehpshesph hkshdrihhm X-ME-Proxy: Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 1 Mar 2022 04:33:37 -0500 (EST) Received: from localhost (ncase [10.192.0.11]) by vm-mail.pks.im (OpenSMTPD) with ESMTPSA id 8f983b86 (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO); Tue, 1 Mar 2022 09:33:34 +0000 (UTC) Date: Tue, 1 Mar 2022 10:33:33 +0100 From: Patrick Steinhardt To: git@vger.kernel.org Cc: Derrick Stolee Subject: [PATCH v2 0/5] fetch: more optimizations for mirror fetches Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Hi, this is another patch series with the aim to speed up mirror fetches. It applies on top of e6ebfd0e8c (The sixth batch, 2022-02-18) with 3824153b23 (Merge branch 'ps/fetch-atomic' into next, 2022-02-18) merged into it to fix a conflict. The only change compared to v2 is an update to the benchmarks so that they're less verbose, as proposed by Derrick. I also had a look at introducing a new helper `parse_object_probably_commit()`, but I didn't find the end result to be much of an improvement compared to the ad-hoc `lookup_commit_in_graph() || parse_object()` dance we do right now. Thanks! Patrick Patrick Steinhardt (5): upload-pack: look up "want" lines via commit-graph fetch: avoid lookup of commits when not appending to FETCH_HEAD refs: add ability for backends to special-case reading of symbolic refs remote: read symbolic refs via `refs_read_symbolic_ref()` refs/files-backend: optimize reading of symbolic refs builtin/fetch.c | 42 +++++++++++++++++++++++++++--------------- builtin/remote.c | 8 +++++--- refs.c | 17 +++++++++++++++++ refs.h | 3 +++ refs/debug.c | 1 + refs/files-backend.c | 33 ++++++++++++++++++++++++++++----- refs/packed-backend.c | 1 + refs/refs-internal.h | 16 ++++++++++++++++ remote.c | 14 +++++++------- upload-pack.c | 20 +++++++++++++++++--- 10 files changed, 122 insertions(+), 33 deletions(-) Range-diff against v1: 1: ca5e136cca ! 1: b5c696bd8e upload-pack: look up "want" lines via commit-graph @@ Commit message Refactor parsing of both "want" and "want-ref" lines to do so. The following benchmark is executed in a repository with a huge number - of references. It uses cached request from git-fetch(1) as input and - contains about 876,000 "want" lines: + of references. It uses cached request from git-fetch(1) as input to + git-upload-pack(1) that contains about 876,000 "want" lines: - Benchmark 1: git-upload-pack (HEAD~) + Benchmark 1: HEAD~ Time (mean ± σ): 7.113 s ± 0.028 s [User: 6.900 s, System: 0.662 s] Range (min … max): 7.072 s … 7.168 s 10 runs - Benchmark 2: git-upload-pack (HEAD) + Benchmark 2: HEAD Time (mean ± σ): 6.622 s ± 0.061 s [User: 6.452 s, System: 0.650 s] Range (min … max): 6.535 s … 6.727 s 10 runs Summary - 'git-upload-pack (HEAD)' ran - 1.07 ± 0.01 times faster than 'git-upload-pack (HEAD~)' + 'HEAD' ran + 1.07 ± 0.01 times faster than 'HEAD~' Signed-off-by: Patrick Steinhardt 2: 80f993dddd ! 2: fbe76b78c3 fetch: avoid lookup of commits when not appending to FETCH_HEAD @@ Commit message Skip this busywork in case we're not writing to FETCH_HEAD. The following benchmark performs a mirror-fetch in a repository with about - two million references: + two million references via `git fetch --prune --no-write-fetch-head + +refs/*:refs/*`: - Benchmark 1: git fetch --prune --no-write-fetch-head +refs/*:refs/* (HEAD~) + Benchmark 1: HEAD~ Time (mean ± σ): 75.388 s ± 1.942 s [User: 71.103 s, System: 8.953 s] Range (min … max): 73.184 s … 76.845 s 3 runs - Benchmark 2: git fetch --prune --no-write-fetch-head +refs/*:refs/* (HEAD) + Benchmark 2: HEAD Time (mean ± σ): 69.486 s ± 1.016 s [User: 65.941 s, System: 8.806 s] Range (min … max): 68.864 s … 70.659 s 3 runs Summary - 'git fetch --prune --no-write-fetch-head +refs/*:refs/* (HEAD)' ran - 1.08 ± 0.03 times faster than 'git fetch --prune --no-write-fetch-head +refs/*:refs/* (HEAD~)' + 'HEAD' ran + 1.08 ± 0.03 times faster than 'HEAD~' Signed-off-by: Patrick Steinhardt 3: 28cacbdbe2 = 3: 29eb81d37c refs: add ability for backends to special-case reading of symbolic refs 4: 1d24101fe4 = 4: 0489380e00 remote: read symbolic refs via `refs_read_symbolic_ref()` 5: 7213ffdbdd ! 5: b6eca63d3b refs/files-backend: optimize reading of symbolic refs @@ Commit message need to skip updating local symbolic references during a fetch, which is why the change results in a significant speedup when doing fetches in repositories with huge numbers of references. The following benchmark - executes a mirror-fetch in a repository with about 2 million references: + executes a mirror-fetch in a repository with about 2 million references + via `git fetch --prune --no-write-fetch-head +refs/*:refs/*`: - Benchmark 1: git fetch --prune --no-write-fetch-head +refs/*:refs/* (HEAD~) + Benchmark 1: HEAD~ Time (mean ± σ): 68.372 s ± 2.344 s [User: 65.629 s, System: 8.786 s] Range (min … max): 65.745 s … 70.246 s 3 runs - Benchmark 2: git fetch --prune --no-write-fetch-head +refs/*:refs/* (HEAD) + Benchmark 2: HEAD Time (mean ± σ): 60.259 s ± 0.343 s [User: 61.019 s, System: 7.245 s] Range (min … max): 60.003 s … 60.649 s 3 runs Summary - 'git fetch --prune --no-write-fetch-head +refs/*:refs/* (HEAD)' ran - 1.13 ± 0.04 times faster than 'git fetch --prune --no-write-fetch-head +refs/*:refs/* (HEAD~)' + 'HEAD' ran + 1.13 ± 0.04 times faster than 'HEAD~' Signed-off-by: Patrick Steinhardt