From patchwork Fri Jan 6 20:36:38 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13091823 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 78C48C63797 for ; Fri, 6 Jan 2023 20:37:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236308AbjAFUhD (ORCPT ); Fri, 6 Jan 2023 15:37:03 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41486 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236067AbjAFUgy (ORCPT ); Fri, 6 Jan 2023 15:36:54 -0500 Received: from mail-wr1-x42c.google.com (mail-wr1-x42c.google.com [IPv6:2a00:1450:4864:20::42c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A368510ED for ; Fri, 6 Jan 2023 12:36:49 -0800 (PST) Received: by mail-wr1-x42c.google.com with SMTP id z5so1299834wrt.6 for ; Fri, 06 Jan 2023 12:36:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=ATlPuguJQCgNGrlBOBW0JFI7+ezP6FpbBehy6jtZDK8=; b=Amsd6pJTknMtM7tQISe6pbAmb5jyKGy/h83L3ocjr/p5Pj8Kv95rko7WKyg3YozRjQ uonTSVdscDwILDLHQiLPZU295Ik0GXdc0JoILxPmQ+Kz4+ZZe9VVkI0pIJdR7GDV7VsI qPmGS47c7BQx77PqEDcisFNFI3npX2sFG5Ak73jYp4bA1Fy12SNmVq7gpc3n4rGyq4a8 QOhPhYoREMKMfnc9JnSRZ7Rqe0HXv2J9A4Cy2nLLO0aY5wCvmg5PuONeXJyDdI+989Gw rcjrqL3zMQRKrxnjN4qubvjYaGGNvknl05vaHgCe5gdbSy2mcrz/PJWw3O05OOdLCUxz xANw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ATlPuguJQCgNGrlBOBW0JFI7+ezP6FpbBehy6jtZDK8=; b=q0bqZ+mdLpuhiXDGrto7Y2u5W7bMRagyA604NfU6bF0GKxpU/P2pJxb/FzsArbeh5q OzlF+FG5+I+Y+byD+fIgjVY17Wq+Lpwh558CfkEUaf3NFBqSo6GGTCjSc7g8Y5CCdOxH kRZRtTrFcXhloGDAuQM/xqC8SZ3065NngaMybtKmxJZQxrPth765hXQtAXXknDsYbHzH YKEoX4e7qc0YSqX74w2E6G4nLecfgmBSl3n5FXD10YSe6YYAPfnzzBmi9F4Qve4lZ7Tq IIJfv3YDFVYJOWdKAQQNBg2m3lI5TE3wZ61SOcw4dMk4KFeFehiBrdEK+ZJAzbLjjrEs J1QQ== X-Gm-Message-State: AFqh2kqQ87uasmpknies4+aUMOPr4mCvTy73gjNSymc4tkF/RqGlXcvL EfLLoiDsxxfybvLQBEMI5O3oNSmJohE= X-Google-Smtp-Source: AMrXdXtPrrDYoa0cSIbJwUz6pQ0TWA8P2pARYfkoHtPo5WBIPeCsT2WdXLXIeVXbkFeJR14SBzripQ== X-Received: by 2002:adf:dd4c:0:b0:25e:2dae:260e with SMTP id u12-20020adfdd4c000000b0025e2dae260emr34624406wrm.24.1673037407913; Fri, 06 Jan 2023 12:36:47 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id w4-20020adfee44000000b0029100e8dedasm2117422wro.28.2023.01.06.12.36.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 06 Jan 2023 12:36:47 -0800 (PST) Message-Id: <39eed9148782c37f5184c5fff7d0e4d1a7a2a1fe.1673037405.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Fri, 06 Jan 2023 20:36:38 +0000 Subject: [PATCH 1/8] t5558: add tests for creationToken heuristic Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, me@ttaylorr.com, vdye@github.com, avarab@gmail.com, steadmon@google.com, chooglen@google.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee As documented in the bundle URI design doc in 2da14fad8fe (docs: document bundle URI standard, 2022-08-09), the 'creationToken' member of a bundle URI allows a bundle provider to specify a total order on the bundles. Future changes will allow the Git client to understand these members and modify its behavior around downloading the bundles in that order. In the meantime, create tests that add creation tokens to the bundle list. For now, the Git client correctly ignores these unknown keys. Signed-off-by: Derrick Stolee --- t/t5558-clone-bundle-uri.sh | 52 +++++++++++++++++++++++++++++++++++-- 1 file changed, 50 insertions(+), 2 deletions(-) diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh index 9155f31fa2c..328caeeae9a 100755 --- a/t/t5558-clone-bundle-uri.sh +++ b/t/t5558-clone-bundle-uri.sh @@ -284,7 +284,17 @@ test_expect_success 'clone HTTP bundle' ' test_config -C clone-http log.excludedecoration refs/bundle/ ' +# usage: test_bundle_downloaded +test_bundle_downloaded () { + cat >pattern <<-EOF && + "event":"child_start".*"argv":\["git-remote-https","$HTTPD_URL/$1"\] + EOF + grep -f pattern "$2" +} + test_expect_success 'clone bundle list (HTTP, no heuristic)' ' + test_when_finished rm -f trace*.txt && + cp clone-from/bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/" && cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF && [bundle] @@ -304,12 +314,19 @@ test_expect_success 'clone bundle list (HTTP, no heuristic)' ' uri = $HTTPD_URL/bundle-4.bundle EOF - git clone --bundle-uri="$HTTPD_URL/bundle-list" \ + GIT_TRACE2_EVENT="$(pwd)/trace-clone.txt" \ + git clone --bundle-uri="$HTTPD_URL/bundle-list" \ clone-from clone-list-http 2>err && ! grep "Repository lacks these prerequisite commits" err && git -C clone-from for-each-ref --format="%(objectname)" >oids && - git -C clone-list-http cat-file --batch-check "$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF && + [bundle] + version = 1 + mode = all + heuristic = creationToken + + [bundle "bundle-1"] + uri = bundle-1.bundle + creationToken = 1 + + [bundle "bundle-2"] + uri = bundle-2.bundle + creationToken = 2 + + [bundle "bundle-3"] + uri = bundle-3.bundle + creationToken = 3 + + [bundle "bundle-4"] + uri = bundle-4.bundle + creationToken = 4 + EOF + + git clone --bundle-uri="$HTTPD_URL/bundle-list" . clone-list-http-2 && + + git -C clone-from for-each-ref --format="%(objectname)" >oids && + git -C clone-list-http-2 cat-file --batch-check X-Patchwork-Id: 13091822 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 591DEC61DB3 for ; Fri, 6 Jan 2023 20:37:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236224AbjAFUg7 (ORCPT ); Fri, 6 Jan 2023 15:36:59 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41468 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235901AbjAFUgv (ORCPT ); Fri, 6 Jan 2023 15:36:51 -0500 Received: from mail-wr1-x42d.google.com (mail-wr1-x42d.google.com [IPv6:2a00:1450:4864:20::42d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7076DDEFA for ; Fri, 6 Jan 2023 12:36:50 -0800 (PST) Received: by mail-wr1-x42d.google.com with SMTP id bs20so2367202wrb.3 for ; Fri, 06 Jan 2023 12:36:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=uUUwq9dFOGqeBx/fgreovJsiUA+ilrJu9BTyknsKnfw=; b=cfIuBY1lnAkvvbL7F6OGZ2coWfvtHMGA47JE6yqrLX2ZSwKpDjytMNDklly1u607Zo PJ7YyawWSRPZ0z6NJ1O9sfUWS2KRLr3j6iGVJ6h7+wMThE+VPEsWN9V9nAeuelRdn1BY WjwezFVYVZeu5lLoNtNBXUq2WXQuP7oQZrzLFzQdZ1v7YcQuXY2yAxXLAWVYQgx4+9v7 P0tTeDBqPTWfcvvyC+lxRbEMNY1v5pa2vlaVh/EO+WmvT19wlPELoJVUtUTV2BiU8WFv EonLuJy41TzURzkfUEzhVddFmXL1YAyW4wj18a9UPLVRsnahtPb3wqt28kVz+aX2resD mtzQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=uUUwq9dFOGqeBx/fgreovJsiUA+ilrJu9BTyknsKnfw=; b=JkzyetDIX6kMMPxZ+pbhg0C1w7m31s0zDIke5QoybU0OQ3Nqt26pLtyzKXWqOeofgs /ievf4ykvRUu46Q742DUmYNkfvvn71+2TxGud9BDf/mqYS4SImA7lp6AeCwopVJdspn8 b2nBMEobd2Dls/QSLCcA1rWSZNpb58VWTw5bdeJjTRh/Q2d6xbsxqB9ZQlAeg+ZLevLI LNLqhPU4/OZfBMkk/+cUjyJMbhjpaBkbVFZkTdg+hn24fICRxSFLk+blAE2UB7Xluo1Y zWGkdanj8Jv01QX/7K5qcfPRLwqbaAADT66mK/ZmnXfHZiD3iCL/5gV566pNx3qVvqTj u5YQ== X-Gm-Message-State: AFqh2kpJZ18RCW7Cuw02JUsFxLhs94VNGUibsgjauuTTsXw7qq8Yns2R UVxpiSQkYClcygJZgJvkW8UbMxkK9cM= X-Google-Smtp-Source: AMrXdXskaiP7PUudo1CCGl+c3+QhLBNiPkjkclMRfU4IeJhGNCnjA7vsFS59XRsyESlD83kXSQWSsQ== X-Received: by 2002:a5d:51c9:0:b0:29c:73e2:68f1 with SMTP id n9-20020a5d51c9000000b0029c73e268f1mr10793474wrv.54.1673037408807; Fri, 06 Jan 2023 12:36:48 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id i10-20020adff30a000000b0024228b0b932sm2479307wro.27.2023.01.06.12.36.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 06 Jan 2023 12:36:48 -0800 (PST) Message-Id: <9007249b9488c23f00c2d498ffd520e4af8b37a4.1673037405.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Fri, 06 Jan 2023 20:36:39 +0000 Subject: [PATCH 2/8] bundle-uri: parse bundle.heuristic=creationToken Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, me@ttaylorr.com, vdye@github.com, avarab@gmail.com, steadmon@google.com, chooglen@google.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee The bundle.heuristic value communicates that the bundle list is organized to make use of the bundle..creationToken values that may be provided in the bundle list. Those values will create a total order on the bundles, allowing the Git client to download them in a specific order and even remember previously-downloaded bundles by storing the maximum creation token value. Before implementing any logic that parses or uses the bundle..creationToken values, teach Git to parse the bundle.heuristic value from a bundle list. We can use 'test-tool bundle-uri' to print the heuristic value and verify that the parsing works correctly. Signed-off-by: Derrick Stolee --- Documentation/config/bundle.txt | 7 +++++++ bundle-uri.c | 21 +++++++++++++++++++++ bundle-uri.h | 14 ++++++++++++++ t/t5750-bundle-uri-parse.sh | 19 +++++++++++++++++++ 4 files changed, 61 insertions(+) diff --git a/Documentation/config/bundle.txt b/Documentation/config/bundle.txt index daa21eb674a..3faae386853 100644 --- a/Documentation/config/bundle.txt +++ b/Documentation/config/bundle.txt @@ -15,6 +15,13 @@ bundle.mode:: complete understanding of the bundled information (`all`) or if any one of the listed bundle URIs is sufficient (`any`). +bundle.heuristic:: + If this string-valued key exists, then the bundle list is designed to + work well with incremental `git fetch` commands. The heuristic signals + that there are additional keys available for each bundle that help + determine which subset of bundles the client should download. The + only value currently understood is `creationToken`. + bundle..*:: The `bundle..*` keys are used to describe a single item in the bundle list, grouped under `` for identification purposes. diff --git a/bundle-uri.c b/bundle-uri.c index 36268dda172..56c94595c2a 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -9,6 +9,11 @@ #include "config.h" #include "remote.h" +static const char *heuristics[] = { + [BUNDLE_HEURISTIC_NONE] = "", + [BUNDLE_HEURISTIC_CREATIONTOKEN] = "creationToken", +}; + static int compare_bundles(const void *hashmap_cmp_fn_data, const struct hashmap_entry *he1, const struct hashmap_entry *he2, @@ -100,6 +105,9 @@ void print_bundle_list(FILE *fp, struct bundle_list *list) fprintf(fp, "\tversion = %d\n", list->version); fprintf(fp, "\tmode = %s\n", mode); + if (list->heuristic) + printf("\theuristic = %s\n", heuristics[list->heuristic]); + for_all_bundles_in_list(list, summarize_bundle, fp); } @@ -142,6 +150,19 @@ static int bundle_list_update(const char *key, const char *value, return 0; } + if (!strcmp(subkey, "heuristic")) { + int i; + for (i = 0; i < BUNDLE_HEURISTIC__COUNT; i++) { + if (!strcmp(value, heuristics[i])) { + list->heuristic = i; + return 0; + } + } + + /* Ignore unknown heuristics. */ + return 0; + } + /* Ignore other unknown global keys. */ return 0; } diff --git a/bundle-uri.h b/bundle-uri.h index d5e89f1671c..ad82174112d 100644 --- a/bundle-uri.h +++ b/bundle-uri.h @@ -52,6 +52,14 @@ enum bundle_list_mode { BUNDLE_MODE_ANY }; +enum bundle_list_heuristic { + BUNDLE_HEURISTIC_NONE = 0, + BUNDLE_HEURISTIC_CREATIONTOKEN, + + /* Must be last. */ + BUNDLE_HEURISTIC__COUNT, +}; + /** * A bundle_list contains an unordered set of remote_bundle_info structs, * as well as information about the bundle listing, such as version and @@ -75,6 +83,12 @@ struct bundle_list { * advertised by the bundle list at that location. */ char *baseURI; + + /** + * A list can have a heuristic, which helps reduce the number of + * downloaded bundles. + */ + enum bundle_list_heuristic heuristic; }; void init_bundle_list(struct bundle_list *list); diff --git a/t/t5750-bundle-uri-parse.sh b/t/t5750-bundle-uri-parse.sh index 7b4f930e532..6fc92a9c0d4 100755 --- a/t/t5750-bundle-uri-parse.sh +++ b/t/t5750-bundle-uri-parse.sh @@ -250,4 +250,23 @@ test_expect_success 'parse config format edge cases: empty key or value' ' test_cmp_config_output expect actual ' +test_expect_success 'parse config format: creationToken heuristic' ' + cat >expect <<-\EOF && + [bundle] + version = 1 + mode = all + heuristic = creationToken + [bundle "one"] + uri = http://example.com/bundle.bdl + [bundle "two"] + uri = https://example.com/bundle.bdl + [bundle "three"] + uri = file:///usr/share/git/bundle.bdl + EOF + + test-tool bundle-uri parse-config expect >actual 2>err && + test_must_be_empty err && + test_cmp_config_output expect actual +' + test_done From patchwork Fri Jan 6 20:36:40 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13091829 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3D332C63797 for ; Fri, 6 Jan 2023 20:37:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229686AbjAFUh1 (ORCPT ); Fri, 6 Jan 2023 15:37:27 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41468 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230075AbjAFUg4 (ORCPT ); Fri, 6 Jan 2023 15:36:56 -0500 Received: from mail-wr1-x430.google.com (mail-wr1-x430.google.com [IPv6:2a00:1450:4864:20::430]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 58A83DE94 for ; Fri, 6 Jan 2023 12:36:51 -0800 (PST) Received: by mail-wr1-x430.google.com with SMTP id m7so2347746wrn.10 for ; Fri, 06 Jan 2023 12:36:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=k3T+9TVREv97UwoXxvWPBKg6hHmW4nnXAkVNq9siomo=; b=aN4bMMjkTxu0H7iuHYrnbTgrEUSoX/lFNCVdJVh9YmPoUNpDTTKZW61EA9b73IQW2u r9BxZ8upGM3q3nq1NUEsnnC3ERpjhdGCS+UaTFsYE0aRo3BoZTqzKeR6ACab4uA1tXej zbtYShgSY2cBMHS6ud3mXpZ3LFQ/fLEJtd9sYNg4BA3qqXTEdJNZWEJT7Xn/mf7JvqWs MJFf2QDKlsm/rfovkthDCS6ydqQVc6ICfXeQrz0+3O0Z9qTOkJMOwNBLKqUFsCjoDJxR STAkEjqEiefAyNFWT24tusLqpBp1ywjq4rc1W21LYatTkWy0pdfem6NjSINRwYJcBCXx fV4w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=k3T+9TVREv97UwoXxvWPBKg6hHmW4nnXAkVNq9siomo=; b=tspv1ZeHz3OB0Lvp4vB0CV/ejCJ8ifMUnndr48/cpyouKbFPr6A2W0rIA/FUvdYpLW ewUKZa5+69/uIzzG+oUodDoq9bO3j2T4cbU/VvfGa1km96fkwU1PaQaWUjv8IkwVlc1V LQZ8DW4I62st82hRxBum16s6XDxD6zN4Ut3NK00mPMK+tm0i/JFqOGuiZf73jJsRFVqQ tcSAh8K2IwQ9cC7ASuQqE7kQzHMAz8yH3Mndd2JWwLNu5VoL2noBy+YW8zasuYKGQst9 b4BpMQBgxt00L6Hs3LsTkZwqT1LFXsu25n8zDbE1TXiuWWQgv1r6y8FoaMK5Uc/lzzSR p49w== X-Gm-Message-State: AFqh2krQ9DE+pTqrEfMOMejBYr2dTyWGnaVq3wA9tGrGEDuJXNGCIite D0l8POxLZ02x9SKgbpr60XWKGwwKvzc= X-Google-Smtp-Source: AMrXdXsbPWEd89bHOUe5th2ddTDI8FhLImDhJT45oYIuR4wzY8Ygtw4wxLsXyoMPL/MA039btQdLaw== X-Received: by 2002:a5d:4842:0:b0:2bb:62bf:f5d1 with SMTP id n2-20020a5d4842000000b002bb62bff5d1mr168989wrs.29.1673037409604; Fri, 06 Jan 2023 12:36:49 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id l6-20020a5d4bc6000000b0027323b19ecesm2084493wrt.16.2023.01.06.12.36.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 06 Jan 2023 12:36:49 -0800 (PST) Message-Id: In-Reply-To: References: Date: Fri, 06 Jan 2023 20:36:40 +0000 Subject: [PATCH 3/8] bundle-uri: parse bundle..creationToken values Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, me@ttaylorr.com, vdye@github.com, avarab@gmail.com, steadmon@google.com, chooglen@google.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee The previous change taught Git to parse the bundle.heuristic value, especially when its value is "creationToken". Now, teach Git to parse the bundle..creationToken values on each bundle in a bundle list. Before implementing any logic based on creationToken values for the creationToken heuristic, parse and print these values for testing purposes. Signed-off-by: Derrick Stolee --- bundle-uri.c | 10 ++++++++++ bundle-uri.h | 6 ++++++ t/t5750-bundle-uri-parse.sh | 18 ++++++++++++++++++ 3 files changed, 34 insertions(+) diff --git a/bundle-uri.c b/bundle-uri.c index 56c94595c2a..63e2cc21057 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -80,6 +80,9 @@ static int summarize_bundle(struct remote_bundle_info *info, void *data) FILE *fp = data; fprintf(fp, "[bundle \"%s\"]\n", info->id); fprintf(fp, "\turi = %s\n", info->uri); + + if (info->creationToken) + fprintf(fp, "\tcreationToken = %"PRIu64"\n", info->creationToken); return 0; } @@ -190,6 +193,13 @@ static int bundle_list_update(const char *key, const char *value, return 0; } + if (!strcmp(subkey, "creationtoken")) { + if (sscanf(value, "%"PRIu64, &bundle->creationToken) != 1) + warning(_("could not parse bundle list key %s with value '%s'"), + "creationToken", value); + return 0; + } + /* * At this point, we ignore any information that we don't * understand, assuming it to be hints for a heuristic the client diff --git a/bundle-uri.h b/bundle-uri.h index ad82174112d..1cae418211b 100644 --- a/bundle-uri.h +++ b/bundle-uri.h @@ -42,6 +42,12 @@ struct remote_bundle_info { * this boolean is true. */ unsigned unbundled:1; + + /** + * If the bundle is part of a list with the creationToken + * heuristic, then we use this member for sorting the bundles. + */ + uint64_t creationToken; }; #define REMOTE_BUNDLE_INFO_INIT { 0 } diff --git a/t/t5750-bundle-uri-parse.sh b/t/t5750-bundle-uri-parse.sh index 6fc92a9c0d4..81bdf58b944 100755 --- a/t/t5750-bundle-uri-parse.sh +++ b/t/t5750-bundle-uri-parse.sh @@ -258,10 +258,13 @@ test_expect_success 'parse config format: creationToken heuristic' ' heuristic = creationToken [bundle "one"] uri = http://example.com/bundle.bdl + creationToken = 123456 [bundle "two"] uri = https://example.com/bundle.bdl + creationToken = 12345678901234567890 [bundle "three"] uri = file:///usr/share/git/bundle.bdl + creationToken = 1 EOF test-tool bundle-uri parse-config expect >actual 2>err && @@ -269,4 +272,19 @@ test_expect_success 'parse config format: creationToken heuristic' ' test_cmp_config_output expect actual ' +test_expect_success 'parse config format edge cases: creationToken heuristic' ' + cat >expect <<-\EOF && + [bundle] + version = 1 + mode = all + heuristic = creationToken + [bundle "one"] + uri = http://example.com/bundle.bdl + creationToken = bogus + EOF + + test-tool bundle-uri parse-config expect >actual 2>err && + grep "could not parse bundle list key creationToken with value '\''bogus'\''" err +' + test_done From patchwork Fri Jan 6 20:36:41 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13091827 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8E175C54EBD for ; Fri, 6 Jan 2023 20:37:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236326AbjAFUhE (ORCPT ); Fri, 6 Jan 2023 15:37:04 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41484 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236093AbjAFUgy (ORCPT ); Fri, 6 Jan 2023 15:36:54 -0500 Received: from mail-wr1-x42d.google.com (mail-wr1-x42d.google.com [IPv6:2a00:1450:4864:20::42d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 35A2F1108 for ; Fri, 6 Jan 2023 12:36:52 -0800 (PST) Received: by mail-wr1-x42d.google.com with SMTP id bn26so2384429wrb.0 for ; Fri, 06 Jan 2023 12:36:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=bIBYOHqWOzrPXhcBU3xFIgt6lHh/eGSCsYdpxJTxxbs=; b=Zt7L0eTIKIWaN7sGE81G741E3JW2NgN+aAtpNWalrpcxyBfaIch7kKXRcOHc2VwCSB WV/gRAaNg8GSsWgNHmCE21344aD4wRYtsIlycPhSKQBqqxMFsDfj8dxNF8G9f63B1K5s Dy5YExlkz2DxhI+L6sMZQmwYyUbgRGQ0XztMpW/t4dTM1qrdjLxG1dDN+BUkyK7xez1B qq021lSwqsoO1nL/veWqgtsMt1ejtPV8Lk5lcZTBlhoiIDIYaodDZ/Lx2XRoqMhdTlCe KjFRZXZOuQC7Y+N3qN4BKL8aWIkZ4W0NqnU+tP/5phw6FYxQRjIzCFsMKxX0SbjAB4on dQgA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=bIBYOHqWOzrPXhcBU3xFIgt6lHh/eGSCsYdpxJTxxbs=; b=EfL+J1csDfsmE1FA85Nq3jPEWTPBrQZBgqvy5T9B3MYLuZfG3rGp/db+nNZd5PRGff zmIqOM4Exg+yC35kJc/CbWitxs1Jwlf8b0gFir7IVIGAw+NwzNTdm6vRSWkSN6E2N485 zfPyJ5ayjJZah4fSvK52YvVWwVkKI+eln4UIOEUFzGfRv4K9WqPkcX9yjWNvE/5U8QEZ 583Ys6fm+SYXciIXLl0E463adAKAQ1XUBrEQPzI49bovBGtc7bLO02XyR1P2U28nRhNz sl3XwZRJIPvOsbTlfCqQpcqSzz0ebqF5a7ZPXxEWEKk2/sl0soBWMcNPt2+cGXmN3xcH EqPQ== X-Gm-Message-State: AFqh2krdfVAdSvrbZ7RpVmG2zamB9NF4dEtjI3wgQgpXjAkKgVyCa9aW eiKB+u5zh6zv1nVEkcQ2gfG+mFrfxh4= X-Google-Smtp-Source: AMrXdXtoVCHQGLH3n6iOLBMRKWbviJR9XqTx/Sy8r1dXnbWlF2bPaWywbH0+SnQYGCeQdE1qZib8RA== X-Received: by 2002:a5d:6e09:0:b0:2bb:31da:f3ea with SMTP id h9-20020a5d6e09000000b002bb31daf3eamr1568536wrz.9.1673037410374; Fri, 06 Jan 2023 12:36:50 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id e7-20020a056000120700b00241dd5de644sm2090717wrx.97.2023.01.06.12.36.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 06 Jan 2023 12:36:50 -0800 (PST) Message-Id: <57c0174d3752fb61a05e0653de9d3057616ed16a.1673037405.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Fri, 06 Jan 2023 20:36:41 +0000 Subject: [PATCH 4/8] bundle-uri: download in creationToken order Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, me@ttaylorr.com, vdye@github.com, avarab@gmail.com, steadmon@google.com, chooglen@google.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee The creationToken heuristic provides an ordering on the bundles advertised by a bundle list. Teach the Git client to download bundles differently when this heuristic is advertised. The bundles in the list are sorted by their advertised creationToken values, then downloaded in decreasing order. This avoids the previous strategy of downloading bundles in an arbitrary order and attempting to apply them (likely failing in the case of required commits) until discovering the order through attempted unbundling. During a fresh 'git clone', it may make sense to download the bundles in increasing order, since that would prevent the need to attempt unbundling a bundle with required commits that do not exist in our empty object store. The cost of testing an unbundle is quite low, and instead the chosen order is optimizing for a future bundle download during a 'git fetch' operation with a non-empty object store. Since the Git client continues fetching from the Git remote after downloading and unbundling bundles, the client's object store can be ahead of the bundle provider's object store. The next time it attempts to download from the bundle list, it makes most sense to download only the most-recent bundles until all tips successfully unbundle. The strategy implemented here provides that short-circuit where the client downloads a minimal set of bundles. A later implementation detail will store the maximum creationToken seen during such a bundle download, and the client will avoid downloading a bundle unless its creationToken is strictly greater than that stored value. For now, if the client seeks to download from an identical bundle list since its previous download, it will download the most-recent bundle then stop since its required commits are already in the object store. Add tests that exercise this behavior, but we will expand upon these tests when incremental downloads during 'git fetch' make use of creationToken values. Signed-off-by: Derrick Stolee --- bundle-uri.c | 140 +++++++++++++++++++++++++++++++++++- t/t5558-clone-bundle-uri.sh | 41 ++++++++++- t/t5601-clone.sh | 50 +++++++++++++ 3 files changed, 227 insertions(+), 4 deletions(-) diff --git a/bundle-uri.c b/bundle-uri.c index 63e2cc21057..b30c85ba6f2 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -434,6 +434,124 @@ static int download_bundle_to_file(struct remote_bundle_info *bundle, void *data return 0; } +struct sorted_bundle_list { + struct remote_bundle_info **items; + size_t alloc; + size_t nr; +}; + +static int insert_bundle(struct remote_bundle_info *bundle, void *data) +{ + struct sorted_bundle_list *list = data; + list->items[list->nr++] = bundle; + return 0; +} + +static int compare_creation_token(const void *va, const void *vb) +{ + const struct remote_bundle_info * const *a = va; + const struct remote_bundle_info * const *b = vb; + + if ((*a)->creationToken > (*b)->creationToken) + return -1; + if ((*a)->creationToken < (*b)->creationToken) + return 1; + return 0; +} + +static int fetch_bundles_by_token(struct repository *r, + struct bundle_list *list) +{ + int cur; + int pop_or_push = 0; + struct bundle_list_context ctx = { + .r = r, + .list = list, + .mode = list->mode, + }; + struct sorted_bundle_list sorted = { + .alloc = hashmap_get_size(&list->bundles), + }; + + ALLOC_ARRAY(sorted.items, sorted.alloc); + + for_all_bundles_in_list(list, insert_bundle, &sorted); + + QSORT(sorted.items, sorted.nr, compare_creation_token); + + /* + * Use a stack-based approach to download the bundles and attempt + * to unbundle them in decreasing order by creation token. If we + * fail to unbundle (after a successful download) then move to the + * next non-downloaded bundle (push to the stack) and attempt + * downloading. Once we succeed in applying a bundle, move to the + * previous unapplied bundle (pop the stack) and attempt to unbundle + * it again. + * + * In the case of a fresh clone, we will likely download all of the + * bundles before successfully unbundling the oldest one, then the + * rest of the bundles unbundle successfully in increasing order + * of creationToken. + * + * If there are existing objects, then this process may terminate + * early when all required commits from "new" bundles exist in the + * repo's object store. + */ + cur = 0; + while (cur >= 0 && cur < sorted.nr) { + struct remote_bundle_info *bundle = sorted.items[cur]; + if (!bundle->file) { + /* Not downloaded yet. Try downloading. */ + if (download_bundle_to_file(bundle, &ctx)) { + /* Failure. Push to the stack. */ + pop_or_push = 1; + goto stack_operation; + } + + /* We expect bundles when using creationTokens. */ + if (!is_bundle(bundle->file, 1)) { + warning(_("file downloaded from '%s' is not a bundle"), + bundle->uri); + break; + } + } + + if (bundle->file && !bundle->unbundled) { + /* + * This was downloaded, but not successfully + * unbundled. Try unbundling again. + */ + if (unbundle_from_file(ctx.r, bundle->file)) { + /* Failed to unbundle. Push to stack. */ + pop_or_push = 1; + } else { + /* Succeeded in unbundle. Pop stack. */ + pop_or_push = -1; + } + } + + /* + * Else case: downloaded and unbundled successfully. + * Skip this by moving in the same direction as the + * previous step. + */ + +stack_operation: + /* Move in the specified direction and repeat. */ + cur += pop_or_push; + } + + free(sorted.items); + + /* + * We succeed if the loop terminates because 'cur' drops below + * zero. The other case is that we terminate because 'cur' + * reaches the end of the list, so we have a failure no matter + * which bundles we apply from the list. + */ + return cur >= 0; +} + static int download_bundle_list(struct repository *r, struct bundle_list *local_list, struct bundle_list *global_list, @@ -471,7 +589,14 @@ static int fetch_bundle_list_in_config_format(struct repository *r, goto cleanup; } - if ((result = download_bundle_list(r, &list_from_bundle, + /* + * If this list uses the creationToken heuristic, then the URIs + * it advertises are expected to be bundles, not nested lists. + * We can drop 'global_list' and 'depth'. + */ + if (list_from_bundle.heuristic == BUNDLE_HEURISTIC_CREATIONTOKEN) + result = fetch_bundles_by_token(r, &list_from_bundle); + else if ((result = download_bundle_list(r, &list_from_bundle, global_list, depth))) goto cleanup; @@ -613,6 +738,14 @@ int fetch_bundle_list(struct repository *r, struct bundle_list *list) int result; struct bundle_list global_list; + /* + * If the creationToken heuristic is used, then the URIs + * advertised by 'list' are not nested lists and instead + * direct bundles. We do not need to use global_list. + */ + if (list->heuristic == BUNDLE_HEURISTIC_CREATIONTOKEN) + return fetch_bundles_by_token(r, list); + init_bundle_list(&global_list); /* If a bundle is added to this global list, then it is required. */ @@ -621,7 +754,10 @@ int fetch_bundle_list(struct repository *r, struct bundle_list *list) if ((result = download_bundle_list(r, list, &global_list, 0))) goto cleanup; - result = unbundle_all_bundles(r, &global_list); + if (list->heuristic == BUNDLE_HEURISTIC_CREATIONTOKEN) + result = fetch_bundles_by_token(r, list); + else + result = unbundle_all_bundles(r, &global_list); cleanup: for_all_bundles_in_list(&global_list, unlink_bundle, NULL); diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh index 328caeeae9a..d7461ec907e 100755 --- a/t/t5558-clone-bundle-uri.sh +++ b/t/t5558-clone-bundle-uri.sh @@ -368,6 +368,8 @@ test_expect_success 'clone bundle list (HTTP, any mode)' ' ' test_expect_success 'clone bundle list (http, creationToken)' ' + test_when_finished rm -f trace*.txt && + cp clone-from/bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/" && cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF && [bundle] @@ -392,10 +394,45 @@ test_expect_success 'clone bundle list (http, creationToken)' ' creationToken = 4 EOF - git clone --bundle-uri="$HTTPD_URL/bundle-list" . clone-list-http-2 && + GIT_TRACE2_EVENT=$(pwd)/trace-clone.txt \ + git clone --bundle-uri="$HTTPD_URL/bundle-list" \ + clone-from clone-list-http-2 && git -C clone-from for-each-ref --format="%(objectname)" >oids && - git -C clone-list-http-2 cat-file --batch-check "$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF && + [bundle] + version = 1 + mode = all + heuristic = creationToken + + [bundle "bundle-1"] + uri = bundle-1.bundle + creationToken = 1 + + [bundle "bundle-2"] + uri = bundle-2.bundle + creationToken = 2 + EOF + + GIT_TRACE2_EVENT=$(pwd)/trace-clone.txt \ + git clone --bundle-uri="$HTTPD_URL/bundle-list" \ + clone-from clone-token-http && + + test_bundle_downloaded bundle-1.bundle trace-clone.txt && + test_bundle_downloaded bundle-2.bundle trace-clone.txt ' # Do not add tests here unless they use the HTTP server, as they will diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh index 1928ea1dd7c..57476b6e6d7 100755 --- a/t/t5601-clone.sh +++ b/t/t5601-clone.sh @@ -831,6 +831,56 @@ test_expect_success 'auto-discover multiple bundles from HTTP clone' ' grep -f pattern trace.txt ' +# Usage: test_bundle_downloaded +test_bundle_downloaded () { + cat >pattern <<-EOF && + "event":"child_start".*"argv":\["git-remote-https","$HTTPD_URL/$1.bundle"\] + EOF + grep -f pattern "$2" +} + +test_expect_success 'auto-discover multiple bundles from HTTP clone: creationToken heuristic' ' + test_when_finished rm -rf "$HTTPD_DOCUMENT_ROOT_PATH/repo4.git" && + test_when_finished rm -rf clone-heuristic trace*.txt && + + test_commit -C src newest && + git -C src bundle create "$HTTPD_DOCUMENT_ROOT_PATH/newest.bundle" HEAD~1..HEAD && + git clone --bare --no-local src "$HTTPD_DOCUMENT_ROOT_PATH/repo4.git" && + + cat >>"$HTTPD_DOCUMENT_ROOT_PATH/repo4.git/config" <<-EOF && + [uploadPack] + advertiseBundleURIs = true + + [bundle] + version = 1 + mode = all + heuristic = creationToken + + [bundle "everything"] + uri = $HTTPD_URL/everything.bundle + creationtoken = 1 + + [bundle "new"] + uri = $HTTPD_URL/new.bundle + creationtoken = 2 + + [bundle "newest"] + uri = $HTTPD_URL/newest.bundle + creationtoken = 3 + EOF + + GIT_TRACE2_EVENT="$(pwd)/trace-clone.txt" \ + git -c protocol.version=2 \ + -c transfer.bundleURI=true clone \ + "$HTTPD_URL/smart/repo4.git" clone-heuristic && + + # We should fetch all bundles + for b in everything new newest + do + test_bundle_downloaded $b trace-clone.txt || return 1 + done +' + # DO NOT add non-httpd-specific tests here, because the last part of this # test script is only executed when httpd is available and enabled. From patchwork Fri Jan 6 20:36:42 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13091825 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A0627C64981 for ; Fri, 6 Jan 2023 20:37:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236387AbjAFUhF (ORCPT ); Fri, 6 Jan 2023 15:37:05 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41468 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236161AbjAFUgy (ORCPT ); Fri, 6 Jan 2023 15:36:54 -0500 Received: from mail-wm1-x329.google.com (mail-wm1-x329.google.com [IPv6:2a00:1450:4864:20::329]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B4984DF42 for ; Fri, 6 Jan 2023 12:36:52 -0800 (PST) Received: by mail-wm1-x329.google.com with SMTP id p1-20020a05600c1d8100b003d8c9b191e0so1894318wms.4 for ; Fri, 06 Jan 2023 12:36:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=kIj/lOnTlyUoUEdpwNnBKfyNb5OD/JzFRM7pdqmx+FQ=; b=IyKnsr8jmG3io6oNF9D4vbSEg3U7BV+nUxrNYZDkfCq0Y3W7a3GEC1/U4L+Ro7gxea TROLS3jWcOcK7xnn4QI2dIudsxN463vpHbkLfl+bWfjOzgldJb32DsfX+HT1HD8wKtYu S6KTVheK4fnEvnsXkhTwVkoX6S6ZBjmK878NakltQCDOM+f7RWXQUP0+0Kt1LWr/2vfS PpHZiWbF2VELZQqNw07litgRe4Y8qxcNGs6y173YpYilkWUQ4y1fryhEhNeXrPm9VnMp 5T49L81Q0UxJ+/QtQxuHValAv5fS2QyuZ25IGNj/DpXV8jp6jDcd2i8jG2PmaqUe/K87 zaQg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=kIj/lOnTlyUoUEdpwNnBKfyNb5OD/JzFRM7pdqmx+FQ=; b=WGRVpSKcDJr+Tu+DsPjAN5/eLNvldRVtolCQZVTb0RG8l05RtLwTaWoOqb/Jhw4ie1 6CpcPbRf8Iz7uW55+PxXJpCwttlIuRgdS4VzshrHgfO0RCfBGdykvC+RIotuatgMF6gv vhaGPQ81wiAcG9ZrZjjvTxF5RnCyi7n/m45Nz4wUCRr64oYCp/bg5/d2kM+t+UeGV6hP 716BLPYVMM3FTXbCLv6QS/7k6d4Z3qRRx7CeIqDVR9L0UakKPeJGFDNAGEY9NeK8eRw+ xztUz7U2ibIg0XRoKkpCl73L3hKOiLR2XZ0OpST303DXX/eE99YokmlZsyO3kLaxEGRN TYcg== X-Gm-Message-State: AFqh2koua9HoRR6nWqxDajcY1bKCcKsZE0jMaArwzGOwNGkFbYmd3oTi f0BlCbPxL2+jKe36AdFUd7L/8kWjp0Y= X-Google-Smtp-Source: AMrXdXs47ncFN8DJPO0igVDl7YbVixVYamZSHkp7aLbS4cKvzmo7vMP9hqglXm3HrFQb8ZXZZxDfTQ== X-Received: by 2002:a05:600c:4c21:b0:3cf:f18b:327e with SMTP id d33-20020a05600c4c2100b003cff18b327emr41161365wmp.4.1673037411080; Fri, 06 Jan 2023 12:36:51 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id j19-20020a05600c301300b003d6b71c0c92sm7712336wmh.45.2023.01.06.12.36.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 06 Jan 2023 12:36:50 -0800 (PST) Message-Id: In-Reply-To: References: Date: Fri, 06 Jan 2023 20:36:42 +0000 Subject: [PATCH 5/8] clone: set fetch.bundleURI if appropriate Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, me@ttaylorr.com, vdye@github.com, avarab@gmail.com, steadmon@google.com, chooglen@google.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee Bundle providers may organize their bundle lists in a way that is intended to improve incremental fetches, not just initial clones. However, they do need to state that they have organized with that in mind, or else the client will not expect to save time by downloading bundles after the initial clone. This is done by specifying a bundle.heuristic value. There are two types of bundle lists: those at a static URI and those that are advertised from a Git remote over protocol v2. The new fetch.bundleURI config value applies for static bundle URIs that are not advertised over protocol v2. If the user specifies a static URI via 'git clone --bundle-uri', then Git can set this config as a reminder for future 'git fetch' operations to check the bundle list before connecting to the remote(s). For lists provided over protocol v2, we will want to take a different approach and create a property of the remote itself by creating a remote..* type config key. That is not implemented in this change. Later changes will update 'git fetch' to consume this option. Signed-off-by: Derrick Stolee --- Documentation/config/fetch.txt | 8 ++++++++ builtin/clone.c | 6 +++++- bundle-uri.c | 10 +++++++--- bundle-uri.h | 8 +++++++- t/t5558-clone-bundle-uri.sh | 33 +++++++++++++++++++++++++++++++++ 5 files changed, 60 insertions(+), 5 deletions(-) diff --git a/Documentation/config/fetch.txt b/Documentation/config/fetch.txt index cd65d236b43..4f796218aab 100644 --- a/Documentation/config/fetch.txt +++ b/Documentation/config/fetch.txt @@ -96,3 +96,11 @@ fetch.writeCommitGraph:: merge and the write may take longer. Having an updated commit-graph file helps performance of many Git commands, including `git merge-base`, `git push -f`, and `git log --graph`. Defaults to false. + +fetch.bundleURI:: + This value stores a URI for fetching Git object data from a bundle URI + before performing an incremental fetch from the origin Git server. If + the value is `` then running `git fetch ` is equivalent to + first running `git fetch --bundle-uri=` immediately before + `git fetch `. See details of the `--bundle-uri` option in + linkgit:git-fetch[1]. diff --git a/builtin/clone.c b/builtin/clone.c index 5453ba5277f..5370617664d 100644 --- a/builtin/clone.c +++ b/builtin/clone.c @@ -1248,12 +1248,16 @@ int cmd_clone(int argc, const char **argv, const char *prefix) * data from the --bundle-uri option. */ if (bundle_uri) { + int has_heuristic = 0; + /* At this point, we need the_repository to match the cloned repo. */ if (repo_init(the_repository, git_dir, work_tree)) warning(_("failed to initialize the repo, skipping bundle URI")); - else if (fetch_bundle_uri(the_repository, bundle_uri)) + else if (fetch_bundle_uri(the_repository, bundle_uri, &has_heuristic)) warning(_("failed to fetch objects from bundle URI '%s'"), bundle_uri); + else if (has_heuristic) + git_config_set_gently("fetch.bundleuri", bundle_uri); } strvec_push(&transport_ls_refs_options.ref_prefixes, "HEAD"); diff --git a/bundle-uri.c b/bundle-uri.c index b30c85ba6f2..1dbbbb980eb 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -594,9 +594,10 @@ static int fetch_bundle_list_in_config_format(struct repository *r, * it advertises are expected to be bundles, not nested lists. * We can drop 'global_list' and 'depth'. */ - if (list_from_bundle.heuristic == BUNDLE_HEURISTIC_CREATIONTOKEN) + if (list_from_bundle.heuristic == BUNDLE_HEURISTIC_CREATIONTOKEN) { result = fetch_bundles_by_token(r, &list_from_bundle); - else if ((result = download_bundle_list(r, &list_from_bundle, + global_list->heuristic = BUNDLE_HEURISTIC_CREATIONTOKEN; + } else if ((result = download_bundle_list(r, &list_from_bundle, global_list, depth))) goto cleanup; @@ -707,7 +708,8 @@ static int unlink_bundle(struct remote_bundle_info *info, void *data) return 0; } -int fetch_bundle_uri(struct repository *r, const char *uri) +int fetch_bundle_uri(struct repository *r, const char *uri, + int *has_heuristic) { int result; struct bundle_list list; @@ -727,6 +729,8 @@ int fetch_bundle_uri(struct repository *r, const char *uri) result = unbundle_all_bundles(r, &list); cleanup: + if (has_heuristic) + *has_heuristic = (list.heuristic != BUNDLE_HEURISTIC_NONE); for_all_bundles_in_list(&list, unlink_bundle, NULL); clear_bundle_list(&list); clear_remote_bundle_info(&bundle, NULL); diff --git a/bundle-uri.h b/bundle-uri.h index 1cae418211b..52b27cd10e3 100644 --- a/bundle-uri.h +++ b/bundle-uri.h @@ -124,8 +124,14 @@ int bundle_uri_parse_config_format(const char *uri, * based on that information. * * Returns non-zero if no bundle information is found at the given 'uri'. + * + * If the pointer 'has_heuristic' is non-NULL, then the value it points to + * will be set to be non-zero if and only if the fetched list has a + * heuristic value. Such a value indicates that the list was designed for + * incremental fetches. */ -int fetch_bundle_uri(struct repository *r, const char *uri); +int fetch_bundle_uri(struct repository *r, const char *uri, + int *has_heuristic); /** * Given a bundle list that was already advertised (likely by the diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh index d7461ec907e..8ff560425ee 100755 --- a/t/t5558-clone-bundle-uri.sh +++ b/t/t5558-clone-bundle-uri.sh @@ -435,6 +435,39 @@ test_expect_success 'clone bundle list (http, creationToken)' ' test_bundle_downloaded bundle-2.bundle trace-clone.txt ' +test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' ' + test_when_finished rm -rf fetch-http-4 trace*.txt && + + cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF && + [bundle] + version = 1 + mode = all + heuristic = creationToken + + [bundle "bundle-1"] + uri = bundle-1.bundle + creationToken = 1 + EOF + + GIT_TRACE2_EVENT="$(pwd)/trace-clone.txt" \ + git clone --single-branch --branch=base \ + --bundle-uri="$HTTPD_URL/bundle-list" \ + "$HTTPD_URL/smart/fetch.git" fetch-http-4 && + + test_cmp_config -C fetch-http-4 "$HTTPD_URL/bundle-list" fetch.bundleuri && + + # The clone should copy two files: the list and bundle-1. + test_bundle_downloaded bundle-list trace-clone.txt && + test_bundle_downloaded bundle-1.bundle trace-clone.txt && + + # only received base ref from bundle-1 + git -C fetch-http-4 for-each-ref --format="%(refname)" "refs/bundles/*" >refs && + cat >expect <<-\EOF && + refs/bundles/base + EOF + test_cmp expect refs +' + # Do not add tests here unless they use the HTTP server, as they will # not run unless the HTTP dependencies exist. From patchwork Fri Jan 6 20:36:43 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13091824 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B6650C6379F for ; Fri, 6 Jan 2023 20:37:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236401AbjAFUhH (ORCPT ); Fri, 6 Jan 2023 15:37:07 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41510 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230308AbjAFUgz (ORCPT ); Fri, 6 Jan 2023 15:36:55 -0500 Received: from mail-wr1-x435.google.com (mail-wr1-x435.google.com [IPv6:2a00:1450:4864:20::435]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9C605C09 for ; Fri, 6 Jan 2023 12:36:53 -0800 (PST) Received: by mail-wr1-x435.google.com with SMTP id m7so2347810wrn.10 for ; Fri, 06 Jan 2023 12:36:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=iNFrRUzAOuNSF8t1leDfh9Tj6xawIPV4QkdV6yQMg+U=; b=KeKDSqisfxKZd4BVngSHCJnSKqJhF0u88n/Fbap/saTgPxPexumr75gB8Ju+UIHGbu +2hwWtkS5rFm9Tl/Vc4Kxnsj9VWGeje6hFC9u/kwFSFC4FoMCIq0kjXkF8bXAixQDebz bX0+mZkN+Tck6cBnO4/mbcZzFG5qr4lHmtBcUoA8oCsJ/rJ+ZGmBPW08zMEiaMHC8cii bGi6KIyPx832Vi9oJMMln+2ENd6tAuIeRpy17wpAfBasn8JMEq+ao+iNCftp/VNjg/sb iHt/U+AX4rkDpb7NjfQtKcQ/JUV/ThR0c7BA2TRc86FBAwxrnKYLU8XQDbqUAxAbfD8H Mg6w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=iNFrRUzAOuNSF8t1leDfh9Tj6xawIPV4QkdV6yQMg+U=; b=u4WenS4wLpHo0LydX8aAqENhghF0YOFApiHhQCLf3jwmj/B7YlwwIZwr+ISzSVsecy DWC5iSBTrTKIaalZGIByNnXoEW3MSdkNdmRK21eMacqUUp8NOpILljB45D/53cNh6w/W 1e6NNPS7shDBy0q+vs2/OkZxN66onX4F17r6PJxkzI+ar/F9I1WKWX6US9ug9VL+TtEi 39uirervLDQcMiHRaXXDnjtToo/vh4l1/LHnEoE8Ei1mYWe2eKKaVOnbizGoFRwkhPha EdLDDWcaYnuNKAVJFGz2ITJRpd1evR5DLvGubYmZUs6sBSPgboKJ5YDcjgr3s1df5u/i Gfdw== X-Gm-Message-State: AFqh2kpvoBdMXPB0VQqKsq/zU6Bq30E2GitZLoF/KMueCOFPLeFoMJvj AIsQqjcRd+hVriUL+TpaNqMszCPhC3o= X-Google-Smtp-Source: AMrXdXtJ8Xq6q/UzINg6CRiIcb6DdWOJzjr6SVSaLP/3TBtJzYBV5cf2BMGdv33IyJnlM7r6xyCv5A== X-Received: by 2002:a5d:54ce:0:b0:242:8bc1:fa08 with SMTP id x14-20020a5d54ce000000b002428bc1fa08mr32879054wrv.27.1673037411949; Fri, 06 Jan 2023 12:36:51 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id i6-20020adfe486000000b002423dc3b1a9sm2030635wrm.52.2023.01.06.12.36.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 06 Jan 2023 12:36:51 -0800 (PST) Message-Id: In-Reply-To: References: Date: Fri, 06 Jan 2023 20:36:43 +0000 Subject: [PATCH 6/8] bundle-uri: drop bundle.flag from design doc Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, me@ttaylorr.com, vdye@github.com, avarab@gmail.com, steadmon@google.com, chooglen@google.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee The Implementation Plan section lists a 'bundle.flag' option that is not documented anywhere else. What is documented elsewhere in the document and implemented by previous changes is the 'bundle.heuristic' config key. For now, a heuristic is required to indicate that a bundle list is organized for use during 'git fetch', and it is also sufficient for all existing designs. Signed-off-by: Derrick Stolee --- Documentation/technical/bundle-uri.txt | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/Documentation/technical/bundle-uri.txt b/Documentation/technical/bundle-uri.txt index b78d01d9adf..91d3a13e327 100644 --- a/Documentation/technical/bundle-uri.txt +++ b/Documentation/technical/bundle-uri.txt @@ -479,14 +479,14 @@ outline for submitting these features: (This choice is an opt-in via a config option and a command-line option.) -4. Allow the client to understand the `bundle.flag=forFetch` configuration +4. Allow the client to understand the `bundle.heuristic` configuration key and the `bundle..creationToken` heuristic. When `git clone` - discovers a bundle URI with `bundle.flag=forFetch`, it configures the - client repository to check that bundle URI during later `git fetch ` + discovers a bundle URI with `bundle.heuristic`, it configures the client + repository to check that bundle URI during later `git fetch ` commands. 5. Allow clients to discover bundle URIs during `git fetch` and configure - a bundle URI for later fetches if `bundle.flag=forFetch`. + a bundle URI for later fetches if `bundle.heuristic` is set. 6. Implement the "inspect headers" heuristic to reduce data downloads when the `bundle..creationToken` heuristic is not available. From patchwork Fri Jan 6 20:36:44 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13091826 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D5A1DC677F1 for ; Fri, 6 Jan 2023 20:37:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236417AbjAFUhI (ORCPT ); Fri, 6 Jan 2023 15:37:08 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41486 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235951AbjAFUgz (ORCPT ); Fri, 6 Jan 2023 15:36:55 -0500 Received: from mail-wm1-x334.google.com (mail-wm1-x334.google.com [IPv6:2a00:1450:4864:20::334]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4BAF9DEC2 for ; Fri, 6 Jan 2023 12:36:54 -0800 (PST) Received: by mail-wm1-x334.google.com with SMTP id fm16-20020a05600c0c1000b003d96fb976efso4390787wmb.3 for ; Fri, 06 Jan 2023 12:36:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=cq6fUvCCRlZM7IOW6GK8FRCdHs6Q5WUvLgxna34762w=; b=aj0PTbj2lR7Jnj8pZK1Bpt51QHPjI8G8kUW6SSzLgQsCU8GlL2dKruJPlbElA43J4/ BFGyJyc5246kh9iWtoysZT7JvWwSY60ImEA5PTQ9osHZ2g4Z9fIFB6BllpTsEgLFVi8/ lMbfKlrqASoZs6yOPXm7rmt6dWLejfW1G0F+Q+6/Hc17b2vB6RBwQ2M+IPS4MQih/TFb uhGXBlzfiPqN+vqFjKXSM59I8ZvNQhS98AOyMqZJ75pAJ37J/PfwifTVC6cuG+zM/UIT 4Rx65MSeX/GB66XfUfTRNri+on8CMblxznk0jjy1eHsvUpbCtoCVkpQ1pz63nBJOdE75 de4A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=cq6fUvCCRlZM7IOW6GK8FRCdHs6Q5WUvLgxna34762w=; b=4i6QNJdp94BcF9SpEiNNH4kvT+u07mdSLoj/5xXG6oknOhC/GZueRSHA69SNYRReaM bw92vSJIcd3oKr58f7hTZGYreU92K1ZRBKCRDQhtwpkHHleqj9hYLaAi11SLyFKe1lFj E1Ef/vXeRNZyhQiAqqmnLwNJJD6LP4ZTnFawrHLSaoJlA6bVQu/1DwHxRiE3xhyS1hyv /PijnwjYdeVoBldwa/NFgUl8B/oY2gQSCAAOI+y2Vzum68oth923/pSGUkQOV65DVCh8 CUAZ8M1OcNCorTshMvO1gqxl5bdlOkaLE999O00jrjN3l0pn5C6wMggCosAbT8A6IycB UP5Q== X-Gm-Message-State: AFqh2kroKmZr2YOZF+FvWLwAB1lLeNbw2JwK47W3+Ttb/V5hMm5leKJD 1AG33/qkkOjsciHLdR8lte7zG2ECJ+A= X-Google-Smtp-Source: AMrXdXtB/XhfDKdGXFyH6RvvI4LkepjEJR0BZoehBHh8Y1iZEDMblDhO60C4zjyRqNRrmS9hLxju9A== X-Received: by 2002:a05:600c:250:b0:3d2:2830:b8bb with SMTP id 16-20020a05600c025000b003d22830b8bbmr44089448wmj.34.1673037412690; Fri, 06 Jan 2023 12:36:52 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id l16-20020a05600c1d1000b003d23928b654sm8263811wms.11.2023.01.06.12.36.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 06 Jan 2023 12:36:52 -0800 (PST) Message-Id: <1627fc158b1e301a1663e24f9f21268b4f1caa55.1673037405.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Fri, 06 Jan 2023 20:36:44 +0000 Subject: [PATCH 7/8] fetch: fetch from an external bundle URI Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, me@ttaylorr.com, vdye@github.com, avarab@gmail.com, steadmon@google.com, chooglen@google.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee When a user specifies a URI via 'git clone --bundle-uri', that URI may be a bundle list that advertises a 'bundle.heuristic' value. In that case, the Git client stores a 'fetch.bundleURI' config value storing that URI. Teach 'git fetch' to check for this config value and download bundles from that URI before fetching from the Git remote(s). Likely, the bundle provider has configured a heuristic (such as "creationToken") that will allow the Git client to download only a portion of the bundles before continuing the fetch. Since this URI is completely independent of the remote server, we want to be sure that we connect to the bundle URI before creating a connection to the Git remote. We do not want to hold a stateful connection for too long if we can avoid it. To test that this works correctly, extend the previous tests that set 'fetch.bundleURI' to do follow-up fetches. The bundle list is updated incrementally at each phase to demonstrate that the heuristic avoids downloading older bundles. This includes the middle fetch downloading the objects in bundle-3.bundle from the Git remote, and therefore not needing that bundle in the third fetch. Signed-off-by: Derrick Stolee --- builtin/fetch.c | 8 +++++ t/t5558-clone-bundle-uri.sh | 59 +++++++++++++++++++++++++++++++++++++ 2 files changed, 67 insertions(+) diff --git a/builtin/fetch.c b/builtin/fetch.c index 7378cafeec9..fbb1d470c38 100644 --- a/builtin/fetch.c +++ b/builtin/fetch.c @@ -29,6 +29,7 @@ #include "commit-graph.h" #include "shallow.h" #include "worktree.h" +#include "bundle-uri.h" #define FORCED_UPDATES_DELAY_WARNING_IN_MS (10 * 1000) @@ -2109,6 +2110,7 @@ static int fetch_one(struct remote *remote, int argc, const char **argv, int cmd_fetch(int argc, const char **argv, const char *prefix) { int i; + const char *bundle_uri; struct string_list list = STRING_LIST_INIT_DUP; struct remote *remote = NULL; int result = 0; @@ -2194,6 +2196,12 @@ int cmd_fetch(int argc, const char **argv, const char *prefix) if (dry_run) write_fetch_head = 0; + if (!git_config_get_string_tmp("fetch.bundleuri", &bundle_uri) && + !starts_with(bundle_uri, "remote:")) { + if (fetch_bundle_uri(the_repository, bundle_uri, NULL)) + warning(_("failed to fetch bundles from '%s'"), bundle_uri); + } + if (all) { if (argc == 1) die(_("fetch --all does not take a repository argument")); diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh index 8ff560425ee..3f4d61a915c 100755 --- a/t/t5558-clone-bundle-uri.sh +++ b/t/t5558-clone-bundle-uri.sh @@ -465,6 +465,65 @@ test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' ' cat >expect <<-\EOF && refs/bundles/base EOF + test_cmp expect refs && + + cat >>"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF && + [bundle "bundle-2"] + uri = bundle-2.bundle + creationToken = 2 + EOF + + # Fetch the objects for bundle-2 _and_ bundle-3. + GIT_TRACE2_EVENT="$(pwd)/trace1.txt" \ + git -C fetch-http-4 fetch origin --no-tags \ + refs/heads/left:refs/heads/left \ + refs/heads/right:refs/heads/right && + + # This fetch should copy two files: the list and bundle-2. + test_bundle_downloaded bundle-list trace1.txt && + test_bundle_downloaded bundle-2.bundle trace1.txt && + ! test_bundle_downloaded bundle-1.bundle trace1.txt && + + # received left from bundle-2 + git -C fetch-http-4 for-each-ref --format="%(refname)" "refs/bundles/*" >refs && + cat >expect <<-\EOF && + refs/bundles/base + refs/bundles/left + EOF + test_cmp expect refs && + + cat >>"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF && + [bundle "bundle-3"] + uri = bundle-3.bundle + creationToken = 3 + + [bundle "bundle-4"] + uri = bundle-4.bundle + creationToken = 4 + EOF + + # This fetch should skip bundle-3.bundle, since its objets are + # already local (we have the requisite commits for bundle-4.bundle). + GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \ + git -C fetch-http-4 fetch origin --no-tags \ + refs/heads/merge:refs/heads/merge && + + # This fetch should copy three files: the list, bundle-3, and bundle-4. + test_bundle_downloaded bundle-list trace2.txt && + test_bundle_downloaded bundle-4.bundle trace2.txt && + ! test_bundle_downloaded bundle-1.bundle trace2.txt && + ! test_bundle_downloaded bundle-2.bundle trace2.txt && + ! test_bundle_downloaded bundle-3.bundle trace2.txt && + + # received merge ref from bundle-4, but right is missing + # because we did not download bundle-3. + git -C fetch-http-4 for-each-ref --format="%(refname)" "refs/bundles/*" >refs && + + cat >expect <<-\EOF && + refs/bundles/base + refs/bundles/left + refs/bundles/merge + EOF test_cmp expect refs ' From patchwork Fri Jan 6 20:36:45 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13091828 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 538C9C54EBD for ; Fri, 6 Jan 2023 20:37:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236019AbjAFUhW (ORCPT ); Fri, 6 Jan 2023 15:37:22 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41484 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236208AbjAFUg4 (ORCPT ); Fri, 6 Jan 2023 15:36:56 -0500 Received: from mail-wr1-x435.google.com (mail-wr1-x435.google.com [IPv6:2a00:1450:4864:20::435]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8E465C69 for ; Fri, 6 Jan 2023 12:36:55 -0800 (PST) Received: by mail-wr1-x435.google.com with SMTP id w13so1085465wrk.9 for ; Fri, 06 Jan 2023 12:36:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=t09YYaN+3bG2L5eWzA6YZ5DcQ7O7CDvB75rZF5nKMAg=; b=TpJKzU6rE+bbPYkF8z/6iSovApRMCPKhBMIgCTcyErIOg026uITu0J/gWbyqrw5J9Z Ze738Dd5/ePWSKn3b/B0p8GhjEq9p2gJN3jgeOqvTBiePzaFUY1wkg2qBU9wNvuXJQaS /2PbQ9pyOgfoza6+n10y/O/i2ILnzjLT7EIgHtcTCSiQC9nv8i9sd7lgwJnAKvdheU+p JCt4/qQuD1D4znCHoEM8ezKjtQua/8Ioets0V6ypvRxVv44AvWmdEOvTdx4HE2PmlTd2 9L4jemd788hatj5aoXtYTV+9uxCvj3hQQrKR0klXjZ1Wf71jHSmR/gFqyQ/r2hR9zXCs Mkag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=t09YYaN+3bG2L5eWzA6YZ5DcQ7O7CDvB75rZF5nKMAg=; b=32KmevQ9cwCUIzFDKjvBRCXjtok49VtGBOCKL93VRuz47KAJdQQh63U2kg9V1JfhSO qDN+LTQpt243R80AqoWsIaAVV7OXBdIvlWacJRyIHovF4wBLcVg7I1uTnYOElxT8a85S YnaD+YaxLaOONktBNZ+9ewyJrCNUG9ywbYD01fonhpqbEwdEJjI19Krp218TyWYgHA2C Bn7NmmvuZ9awfHTpRYTTHxkWFuRrBRv416bGYpdBze0kAvgKE4ucTNcGAK51XHc3qMgT p3+pg6ERtWs/atcToYasrfGKFwnyLF4CVZ1C2qoxPXK8jbtndeSuGDii7cOkJb5ecuGD IZpQ== X-Gm-Message-State: AFqh2kpUUZ++YBFyP+77elvxDOD2ISoIveJXDN9oHwfdoS7meEtEn2CJ gcftEsrLos1teLjCcDzvwOmmVbWu+9U= X-Google-Smtp-Source: AMrXdXtbWZHTeemlQEMs6JF6JoZwLXC1BycOLHUAEf/H6zDOnKcxndiFrsYiPMfGwcvODIuG02AFUA== X-Received: by 2002:a5d:56c1:0:b0:288:d139:3690 with SMTP id m1-20020a5d56c1000000b00288d1393690mr21767085wrw.67.1673037413444; Fri, 06 Jan 2023 12:36:53 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id q11-20020adf9dcb000000b00268aae5fb5bsm2133957wre.3.2023.01.06.12.36.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 06 Jan 2023 12:36:53 -0800 (PST) Message-Id: <51f210ddeb46fb06e885dc384a486c4bb16ad8cd.1673037405.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Fri, 06 Jan 2023 20:36:45 +0000 Subject: [PATCH 8/8] bundle-uri: store fetch.bundleCreationToken Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, me@ttaylorr.com, vdye@github.com, avarab@gmail.com, steadmon@google.com, chooglen@google.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee When a bundle list specifies the "creationToken" heuristic, the Git client downloads the list and then starts downloading bundles in descending creationToken order. This process stops as soon as all downloaded bundles can be applied to the repository (because all required commits are present in the repository or in the downloaded bundles). When checking the same bundle list twice, this strategy requires downloading the bundle with the maximum creationToken again, which is wasteful. The creationToken heuristic promises that the client will not have a use for that bundle if its creationToken value is the at most the previous creationToken value. To prevent these wasteful downloads, create a fetch.bundleCreationToken config setting that the Git client sets after downloading bundles. This value allows skipping that maximum bundle download when this config value is the same value (or larger). To test that this works correctly, we can insert some "duplicate" fetches into existing tests and demonstrate that only the bundle list is downloaded. The previous logic for downloading bundles by creationToken worked even if the bundle list was empty, but now we have logic that depends on the first entry of the list. Terminate early in the (non-sensical) case of an empty bundle list. Signed-off-by: Derrick Stolee --- Documentation/config/fetch.txt | 8 ++++++++ bundle-uri.c | 35 ++++++++++++++++++++++++++++++++-- t/t5558-clone-bundle-uri.sh | 25 +++++++++++++++++++++++- 3 files changed, 65 insertions(+), 3 deletions(-) diff --git a/Documentation/config/fetch.txt b/Documentation/config/fetch.txt index 4f796218aab..96755ba148b 100644 --- a/Documentation/config/fetch.txt +++ b/Documentation/config/fetch.txt @@ -104,3 +104,11 @@ fetch.bundleURI:: first running `git fetch --bundle-uri=` immediately before `git fetch `. See details of the `--bundle-uri` option in linkgit:git-fetch[1]. + +fetch.bundleCreationToken:: + When using `fetch.bundleURI` to fetch incrementally from a bundle + list that uses the "creationToken" heuristic, this config value + stores the maximum `creationToken` value of the downloaded bundles. + This value is used to prevent downloading bundles in the future + if the advertised `creationToken` is not strictly larger than this + value. diff --git a/bundle-uri.c b/bundle-uri.c index 1dbbbb980eb..98655bd6721 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -464,6 +464,8 @@ static int fetch_bundles_by_token(struct repository *r, { int cur; int pop_or_push = 0; + const char *creationTokenStr; + uint64_t maxCreationToken; struct bundle_list_context ctx = { .r = r, .list = list, @@ -477,8 +479,27 @@ static int fetch_bundles_by_token(struct repository *r, for_all_bundles_in_list(list, insert_bundle, &sorted); + if (!sorted.nr) { + free(sorted.items); + return 0; + } + QSORT(sorted.items, sorted.nr, compare_creation_token); + /* + * If fetch.bundleCreationToken exists, parses to a uint64t, and + * is not strictly smaller than the maximum creation token in the + * bundle list, then do not download any bundles. + */ + if (!repo_config_get_value(r, + "fetch.bundlecreationtoken", + &creationTokenStr) && + sscanf(creationTokenStr, "%"PRIu64, &maxCreationToken) == 1 && + sorted.items[0]->creationToken <= maxCreationToken) { + free(sorted.items); + return 0; + } + /* * Use a stack-based approach to download the bundles and attempt * to unbundle them in decreasing order by creation token. If we @@ -541,14 +562,24 @@ stack_operation: cur += pop_or_push; } - free(sorted.items); - /* * We succeed if the loop terminates because 'cur' drops below * zero. The other case is that we terminate because 'cur' * reaches the end of the list, so we have a failure no matter * which bundles we apply from the list. */ + if (cur < 0) { + struct strbuf value = STRBUF_INIT; + strbuf_addf(&value, "%"PRIu64"", sorted.items[0]->creationToken); + if (repo_config_set_multivar_gently(ctx.r, + "fetch.bundleCreationToken", + value.buf, NULL, 0)) + warning(_("failed to store maximum creation token")); + + strbuf_release(&value); + } + + free(sorted.items); return cur >= 0; } diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh index 3f4d61a915c..0604d721f1b 100755 --- a/t/t5558-clone-bundle-uri.sh +++ b/t/t5558-clone-bundle-uri.sh @@ -455,6 +455,7 @@ test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' ' "$HTTPD_URL/smart/fetch.git" fetch-http-4 && test_cmp_config -C fetch-http-4 "$HTTPD_URL/bundle-list" fetch.bundleuri && + test_cmp_config -C fetch-http-4 1 fetch.bundlecreationtoken && # The clone should copy two files: the list and bundle-1. test_bundle_downloaded bundle-list trace-clone.txt && @@ -479,6 +480,8 @@ test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' ' refs/heads/left:refs/heads/left \ refs/heads/right:refs/heads/right && + test_cmp_config -C fetch-http-4 2 fetch.bundlecreationtoken && + # This fetch should copy two files: the list and bundle-2. test_bundle_downloaded bundle-list trace1.txt && test_bundle_downloaded bundle-2.bundle trace1.txt && @@ -492,6 +495,15 @@ test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' ' EOF test_cmp expect refs && + # No-op fetch + GIT_TRACE2_EVENT="$(pwd)/trace1b.txt" \ + git -C fetch-http-4 fetch origin --no-tags \ + refs/heads/left:refs/heads/left \ + refs/heads/right:refs/heads/right && + test_bundle_downloaded bundle-list trace1b.txt && + ! test_bundle_downloaded bundle-1.bundle trace1b.txt && + ! test_bundle_downloaded bundle-2.bundle trace1b.txt && + cat >>"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF && [bundle "bundle-3"] uri = bundle-3.bundle @@ -508,6 +520,8 @@ test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' ' git -C fetch-http-4 fetch origin --no-tags \ refs/heads/merge:refs/heads/merge && + test_cmp_config -C fetch-http-4 4 fetch.bundlecreationtoken && + # This fetch should copy three files: the list, bundle-3, and bundle-4. test_bundle_downloaded bundle-list trace2.txt && test_bundle_downloaded bundle-4.bundle trace2.txt && @@ -524,7 +538,16 @@ test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' ' refs/bundles/left refs/bundles/merge EOF - test_cmp expect refs + test_cmp expect refs && + + # No-op fetch + GIT_TRACE2_EVENT="$(pwd)/trace2b.txt" \ + git -C fetch-http-4 fetch origin && + test_bundle_downloaded bundle-list trace2b.txt && + ! test_bundle_downloaded bundle-1.bundle trace2b.txt && + ! test_bundle_downloaded bundle-2.bundle trace2b.txt && + ! test_bundle_downloaded bundle-3.bundle trace2b.txt && + ! test_bundle_downloaded bundle-4.bundle trace2b.txt ' # Do not add tests here unless they use the HTTP server, as they will