From patchwork Thu Apr 2 19:19:16 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Tan X-Patchwork-Id: 11471351 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BE7CC1668 for ; Thu, 2 Apr 2020 19:19:29 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 9CBF32080C for ; Thu, 2 Apr 2020 19:19:29 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="njFdZPYd" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2389545AbgDBTT2 (ORCPT ); Thu, 2 Apr 2020 15:19:28 -0400 Received: from mail-pg1-f201.google.com ([209.85.215.201]:42845 "EHLO mail-pg1-f201.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732937AbgDBTT2 (ORCPT ); Thu, 2 Apr 2020 15:19:28 -0400 Received: by mail-pg1-f201.google.com with SMTP id g10so3858561pgg.9 for ; Thu, 02 Apr 2020 12:19:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=bmeGpckLvyJH+KO+r8gy9KGUkRNvdgUpS/eIbC3Vj9c=; b=njFdZPYdb/rkogAJx5GmiqF6b9Yu6VxNi+DwUfUKYk+IvIBLZphPb31kg4Efgdo67E oZ0Vgwzc87Gnc0o9u8ED24oq1GyRUJ98RhCFNg+qVX66cK/Bf1gjb2SWWg3Dg3pj+gjl iJKq7hy+dOxHZJGrOWctalv56s3iXb/wPvzDTqrIKbj4uqi0h+rCdGP28E+C85W7kDIE 06HmkrWaZp9aMymjPUDPY3t8sxt6Wi1tnsEJn6SweLxaxCghnFTP1M+PuGq8oDxx1ptO iUJUtfA15xpsHDVulEs5nFDYxD26UowHSvRoVH03wNEWRwxRzzKlSIG7bm1q/U8+n7Hf 876Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=bmeGpckLvyJH+KO+r8gy9KGUkRNvdgUpS/eIbC3Vj9c=; b=j/+tA7S819MURHcSEamQLKzcEUleJAlTc7lUXV4NBBxK13I0irHp7JfSxwtCnARuwE z3VzvTlBpgReKLvuQVFCulLqcG35i83XSm+ywu1xbR5gDG+7hPaNICjAbgCiFFMcmASr aHhIbo0wqQxu+YYmHuYdxX0tPjvVMHrHXUQHc8zoTsnk15QRVRLM6dxTKRaQfGEDhc/8 suqIUpaRwpiM82J+bi69C4ig0sCMZlnc0UXy7fu5XdJulCgpHPrisw0GZr8PEPZh2hHj 6jFIeyuv44Mz7IO1oEy+6AxvzyrPkjaUdXMznpDlZoCpBOOh0bxOS1FHcVf67+c+xZ8c Rb/w== X-Gm-Message-State: AGi0PuaES3rO5LN4tQARfguZz3MJWQhnW2z7qpwSaBXnZ5SB76SwqjSG bZUR9Zup2VSwXNhZoAheQQUZvD91Pi1ucYOPv7uxwSOQIl8Evn+CQVmQKHVwz4rhZM998vuOi22 IFd+hLaY4FCJu3PHSAgqCTVjpygW4zSZOmhF/0V6W0IPRM760QkxQG+DWdurE1/lkBTrapOFQDa Zi X-Google-Smtp-Source: APiQypKASzfwmKuXLoAUtlxPakZzsqWzu1Qw4yIVfa8jOBj5O0jWHRsC6febPXV6c2a3NGrGIqauH4/+KkJn4p7Wio2q X-Received: by 2002:a17:90a:ab0a:: with SMTP id m10mr5539158pjq.105.1585855166836; Thu, 02 Apr 2020 12:19:26 -0700 (PDT) Date: Thu, 2 Apr 2020 12:19:16 -0700 In-Reply-To: Message-Id: <474eb27d9c136fb69e961546004cfb531d722e2c.1585854639.git.jonathantanmy@google.com> Mime-Version: 1.0 References: <20200331020418.55640-1-jonathantanmy@google.com> X-Mailer: git-send-email 2.26.0.292.g33ef6b2f38-goog Subject: [PATCH v2 1/2] promisor-remote: accept 0 as oid_nr in function From: Jonathan Tan To: git@vger.kernel.org Cc: Jonathan Tan , gitster@pobox.com, stolee@gmail.com Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org There are 3 callers to promisor_remote_get_direct() that first check if the number of objects to be fetched is equal to 0. Fold that check into promisor_remote_get_direct(), and in doing so, be explicit as to what promisor_remote_get_direct() does if oid_nr is 0 (it returns 0, success, immediately). Signed-off-by: Jonathan Tan --- builtin/index-pack.c | 5 ++--- diff.c | 11 +++++------ promisor-remote.c | 3 +++ promisor-remote.h | 8 ++++++++ unpack-trees.c | 5 ++--- 5 files changed, 20 insertions(+), 12 deletions(-) diff --git a/builtin/index-pack.c b/builtin/index-pack.c index d967d188a3..f176dd28c8 100644 --- a/builtin/index-pack.c +++ b/builtin/index-pack.c @@ -1368,9 +1368,8 @@ static void fix_unresolved_deltas(struct hashfile *f) continue; oid_array_append(&to_fetch, &d->oid); } - if (to_fetch.nr) - promisor_remote_get_direct(the_repository, - to_fetch.oid, to_fetch.nr); + promisor_remote_get_direct(the_repository, + to_fetch.oid, to_fetch.nr); oid_array_clear(&to_fetch); } diff --git a/diff.c b/diff.c index 1010d806f5..f01b4d91b8 100644 --- a/diff.c +++ b/diff.c @@ -6520,12 +6520,11 @@ void diffcore_std(struct diff_options *options) add_if_missing(options->repo, &to_fetch, p->one); add_if_missing(options->repo, &to_fetch, p->two); } - if (to_fetch.nr) - /* - * NEEDSWORK: Consider deduplicating the OIDs sent. - */ - promisor_remote_get_direct(options->repo, - to_fetch.oid, to_fetch.nr); + /* + * NEEDSWORK: Consider deduplicating the OIDs sent. + */ + promisor_remote_get_direct(options->repo, + to_fetch.oid, to_fetch.nr); oid_array_clear(&to_fetch); } diff --git a/promisor-remote.c b/promisor-remote.c index 9f338c945f..2155dfe657 100644 --- a/promisor-remote.c +++ b/promisor-remote.c @@ -241,6 +241,9 @@ int promisor_remote_get_direct(struct repository *repo, int to_free = 0; int res = -1; + if (oid_nr == 0) + return 0; + promisor_remote_init(); for (r = promisors; r; r = r->next) { diff --git a/promisor-remote.h b/promisor-remote.h index 737bac3a33..6343c47d18 100644 --- a/promisor-remote.h +++ b/promisor-remote.h @@ -20,6 +20,14 @@ struct promisor_remote { void promisor_remote_reinit(void); struct promisor_remote *promisor_remote_find(const char *remote_name); int has_promisor_remote(void); + +/* + * Fetches all requested objects from all promisor remotes, trying them one at + * a time until all objects are fetched. Returns 0 upon success, and non-zero + * otherwise. + * + * If oid_nr is 0, this function returns 0 (success) immediately. + */ int promisor_remote_get_direct(struct repository *repo, const struct object_id *oids, int oid_nr); diff --git a/unpack-trees.c b/unpack-trees.c index f618a644ef..4c3191b947 100644 --- a/unpack-trees.c +++ b/unpack-trees.c @@ -423,9 +423,8 @@ static int check_updates(struct unpack_trees_options *o) continue; oid_array_append(&to_fetch, &ce->oid); } - if (to_fetch.nr) - promisor_remote_get_direct(the_repository, - to_fetch.oid, to_fetch.nr); + promisor_remote_get_direct(the_repository, + to_fetch.oid, to_fetch.nr); oid_array_clear(&to_fetch); } for (i = 0; i < index->cache_nr; i++) { From patchwork Thu Apr 2 19:19:17 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Tan X-Patchwork-Id: 11471353 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 38E9F14DD for ; Thu, 2 Apr 2020 19:19:32 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 0D3BA20737 for ; Thu, 2 Apr 2020 19:19:32 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="EsMIvKgv" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2389567AbgDBTTb (ORCPT ); Thu, 2 Apr 2020 15:19:31 -0400 Received: from mail-pf1-f201.google.com ([209.85.210.201]:45057 "EHLO mail-pf1-f201.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732937AbgDBTTa (ORCPT ); Thu, 2 Apr 2020 15:19:30 -0400 Received: by mail-pf1-f201.google.com with SMTP id a188so3804533pfa.12 for ; Thu, 02 Apr 2020 12:19:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=nJH+XkRCLKStzMVeqrYC5JZuRAXjYrckIk0im04C90w=; b=EsMIvKgv1lg4Ixe0jHa0stHnCRUj26d3i+3IxacTaH0Tz4hTKqeJlRXoQHOsUSYVVa 1mjpQYZr2b02G65B0iA03PwecF+trY5mDbwKH3gWAWPYEIUGKfE7FnSoFEAjz7BNLz0R wg4pDTKPm8ZuRcBP7JB/Pk12v/Yyed3F7oHLfWl6tUHgqo/+EN7a0tsp7pM5G8HIKh0R X/Pvw3CBaiWkd3H9bOg4chogkIqXsi08KTmYCaKRNVllgl6dysBISPDBcozHo4uZm5HU PSqZP+h5NOXPB6r/DeKL4YjvyEbNgiQEV2+XySJnJLlv8rHUiDGYpd44z93Y9agq/U4f YPgQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=nJH+XkRCLKStzMVeqrYC5JZuRAXjYrckIk0im04C90w=; b=QVaxpwbjQnrftVeOe2uTcpuvLuvrjKZOQZK9/9jiLxs4/fl2RI6nxRkw1s187v3VPk Pz0EnDhBHnoqngYZbvn7bTJX3YHCMNHgpyEiq0Qs14FEHktegrOtDKH8rR9Tmm73PcyU N79TET0G380TfNSnyHE3YuSMPzuFXMPnZglq2tBl/fgBoJ5mMcXA5e/0xewLk7qzpx3h wAN8DMVldbM8mh4NZRwQ+Sf/aCdLLfwtYPNzPRofGuX09ItJueKJa9CdN2JmFh4GBOj9 vpqATRUP7/YLl7JFtVeKUbhz9WH6hVPHp54gMKDe6ups5zBV9f/U1/YcT7mIZNjWZ1ck Eafg== X-Gm-Message-State: AGi0PuZO5QBSclfAYhp5QL3J4WuMPXnpTmCngISm58GArgfupVe6WulD Pe8WlGREjtR96d/fKfRHQoIUmBdlPBu2gN62dWATVMGvOhxR0DMFwMKYDs8tqHh3pb+kCG5+v11 PbciS56RCvHlAUyi3EfBNmMqcpe2loZzmebQvA5QDeGwNY6J3pe13babErQHj5YRXZheZYPUJBK Fj X-Google-Smtp-Source: APiQypLRnBcGE7G/09AYwrSdktT/zs0Ib0SSZg23iJFm3kjhRGSemS3N5ffa2VgGnlsCPABKSmAeM+JHetWkG1fi+iln X-Received: by 2002:a17:90a:aa83:: with SMTP id l3mr5572898pjq.100.1585855168698; Thu, 02 Apr 2020 12:19:28 -0700 (PDT) Date: Thu, 2 Apr 2020 12:19:17 -0700 In-Reply-To: Message-Id: Mime-Version: 1.0 References: <20200331020418.55640-1-jonathantanmy@google.com> X-Mailer: git-send-email 2.26.0.292.g33ef6b2f38-goog Subject: [PATCH v2 2/2] diff: restrict when prefetching occurs From: Jonathan Tan To: git@vger.kernel.org Cc: Jonathan Tan , gitster@pobox.com, stolee@gmail.com, Jeff King Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Commit 7fbbcb21b1 ("diff: batch fetching of missing blobs", 2019-04-08) optimized "diff" by prefetching blobs in a partial clone, but there are some cases wherein blobs do not need to be prefetched. In particular, if (1) no blob data is included in the output of the diff, (2) break-rewrite detection is not requested, and (3) no inexact rename detection is needed, then no blobs are read at all. Therefore, in such a case, do not prefetch. Change diffcore_std() to only prefetch if (1) and/or (2) is not true (currently, it always prefetches); change diffcore_rename() to prefetch if (3) is not true and no prefetch has yet occurred. Helped-by: Jeff King Signed-off-by: Jonathan Tan --- diff.c | 38 +++++++++++++++++++-------- diffcore-rename.c | 37 ++++++++++++++++++++++++++- diffcore.h | 10 +++++++- t/t4067-diff-partial-clone.sh | 48 +++++++++++++++++++++++++++++++++++ 4 files changed, 121 insertions(+), 12 deletions(-) diff --git a/diff.c b/diff.c index f01b4d91b8..857f02f481 100644 --- a/diff.c +++ b/diff.c @@ -6494,9 +6494,9 @@ void diffcore_fix_diff_index(void) QSORT(q->queue, q->nr, diffnamecmp); } -static void add_if_missing(struct repository *r, - struct oid_array *to_fetch, - const struct diff_filespec *filespec) +void diff_add_if_missing(struct repository *r, + struct oid_array *to_fetch, + const struct diff_filespec *filespec) { if (filespec && filespec->oid_valid && !S_ISGITLINK(filespec->mode) && @@ -6507,24 +6507,42 @@ static void add_if_missing(struct repository *r, void diffcore_std(struct diff_options *options) { - if (options->repo == the_repository && has_promisor_remote()) { - /* - * Prefetch the diff pairs that are about to be flushed. - */ + int prefetched = 0; + int output_formats_to_prefetch = DIFF_FORMAT_DIFFSTAT | + DIFF_FORMAT_NUMSTAT | + DIFF_FORMAT_PATCH | + DIFF_FORMAT_SHORTSTAT | + DIFF_FORMAT_DIRSTAT; + + /* + * Check if the user requested a blob-data-requiring diff output and/or + * break-rewrite detection (which requires blob data). If yes, prefetch + * the diff pairs. + * + * If no prefetching occurs, diffcore_rename() will prefetch if it + * decides that it needs inexact rename detection. + */ + if (options->repo == the_repository && has_promisor_remote() && + (options->output_format & output_formats_to_prefetch || + (!options->found_follow && options->break_opt != -1))) { int i; struct diff_queue_struct *q = &diff_queued_diff; struct oid_array to_fetch = OID_ARRAY_INIT; for (i = 0; i < q->nr; i++) { struct diff_filepair *p = q->queue[i]; - add_if_missing(options->repo, &to_fetch, p->one); - add_if_missing(options->repo, &to_fetch, p->two); + diff_add_if_missing(options->repo, &to_fetch, p->one); + diff_add_if_missing(options->repo, &to_fetch, p->two); } + + prefetched = 1; + /* * NEEDSWORK: Consider deduplicating the OIDs sent. */ promisor_remote_get_direct(options->repo, to_fetch.oid, to_fetch.nr); + oid_array_clear(&to_fetch); } @@ -6537,7 +6555,7 @@ void diffcore_std(struct diff_options *options) diffcore_break(options->repo, options->break_opt); if (options->detect_rename) - diffcore_rename(options); + diffcore_rename(options, prefetched); if (options->break_opt != -1) diffcore_merge_broken(); } diff --git a/diffcore-rename.c b/diffcore-rename.c index e189f407af..79ac1b4bee 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -7,6 +7,7 @@ #include "object-store.h" #include "hashmap.h" #include "progress.h" +#include "promisor-remote.h" /* Table of rename/copy destinations */ @@ -448,7 +449,7 @@ static int find_renames(struct diff_score *mx, int dst_cnt, int minimum_score, i return count; } -void diffcore_rename(struct diff_options *options) +void diffcore_rename(struct diff_options *options, int prefetched) { int detect_rename = options->detect_rename; int minimum_score = options->rename_score; @@ -538,6 +539,40 @@ void diffcore_rename(struct diff_options *options) break; } + if (!prefetched) { + /* + * At this point we know there's actual work to do: we have rename + * destinations that didn't find an exact match, and we have potential + * sources. So we'll have to do inexact rename detection, which + * requires looking at the blobs. + * + * If we haven't already prefetched, it's worth pre-fetching + * them as a group now. + */ + int i; + struct oid_array to_fetch = OID_ARRAY_INIT; + + for (i = 0; i < rename_dst_nr; i++) { + if (rename_dst[i].pair) + continue; /* already found exact match */ + diff_add_if_missing(options->repo, &to_fetch, rename_dst[i].two); + } + for (i = 0; i < rename_src_nr; i++) { + if (skip_unmodified && + diff_unmodified_pair(rename_src[i].p)) + /* + * The "for" loop below will not need these + * blobs, so skip prefetching. + */ + continue; + diff_add_if_missing(options->repo, &to_fetch, rename_src[i].p->one); + } + if (to_fetch.nr) + promisor_remote_get_direct(options->repo, + to_fetch.oid, to_fetch.nr); + oid_array_clear(&to_fetch); + } + if (options->show_rename_progress) { progress = start_delayed_progress( _("Performing inexact rename detection"), diff --git a/diffcore.h b/diffcore.h index 7c07347e42..d7af6ab018 100644 --- a/diffcore.h +++ b/diffcore.h @@ -144,7 +144,7 @@ struct diff_filepair *diff_queue(struct diff_queue_struct *, void diff_q(struct diff_queue_struct *, struct diff_filepair *); void diffcore_break(struct repository *, int); -void diffcore_rename(struct diff_options *); +void diffcore_rename(struct diff_options *, int prefetched); void diffcore_merge_broken(void); void diffcore_pickaxe(struct diff_options *); void diffcore_order(const char *orderfile); @@ -182,4 +182,12 @@ int diffcore_count_changes(struct repository *r, unsigned long *src_copied, unsigned long *literal_added); +/* + * If filespec contains an OID and if that object is missing from the given + * repository, add that OID to to_fetch. + */ +void diff_add_if_missing(struct repository *r, + struct oid_array *to_fetch, + const struct diff_filespec *filespec); + #endif diff --git a/t/t4067-diff-partial-clone.sh b/t/t4067-diff-partial-clone.sh index 4831ad35e6..c1ed1c2fc4 100755 --- a/t/t4067-diff-partial-clone.sh +++ b/t/t4067-diff-partial-clone.sh @@ -131,4 +131,52 @@ test_expect_success 'diff with rename detection batches blobs' ' test_line_count = 1 done_lines ' +test_expect_success 'diff does not fetch anything if inexact rename detection is not needed' ' + test_when_finished "rm -rf server client trace" && + + test_create_repo server && + echo a >server/a && + printf "b\nb\nb\nb\nb\n" >server/b && + git -C server add a b && + git -C server commit -m x && + mv server/b server/c && + git -C server add c && + git -C server commit -a -m x && + + test_config -C server uploadpack.allowfilter 1 && + test_config -C server uploadpack.allowanysha1inwant 1 && + git clone --bare --filter=blob:limit=0 "file://$(pwd)/server" client && + + # Ensure no fetches. + GIT_TRACE_PACKET="$(pwd)/trace" git -C client diff --raw -M HEAD^ HEAD && + ! test_path_exists trace +' + +test_expect_success 'diff --break-rewrites fetches only if necessary, and batches blobs if it does' ' + test_when_finished "rm -rf server client trace" && + + test_create_repo server && + echo a >server/a && + printf "b\nb\nb\nb\nb\n" >server/b && + git -C server add a b && + git -C server commit -m x && + printf "c\nc\nc\nc\nc\n" >server/b && + git -C server commit -a -m x && + + test_config -C server uploadpack.allowfilter 1 && + test_config -C server uploadpack.allowanysha1inwant 1 && + git clone --bare --filter=blob:limit=0 "file://$(pwd)/server" client && + + # Ensure no fetches. + GIT_TRACE_PACKET="$(pwd)/trace" git -C client diff --raw -M HEAD^ HEAD && + ! test_path_exists trace && + + # But with --break-rewrites, ensure that there is exactly 1 negotiation + # by checking that there is only 1 "done" line sent. ("done" marks the + # end of negotiation.) + GIT_TRACE_PACKET="$(pwd)/trace" git -C client diff --break-rewrites --raw -M HEAD^ HEAD && + grep "git> done" trace >done_lines && + test_line_count = 1 done_lines +' + test_done