From patchwork Tue Jun 1 14:58:31 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12291471 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 95718C47080 for ; Tue, 1 Jun 2021 14:58:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 73791613AE for ; Tue, 1 Jun 2021 14:58:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234265AbhFAPAX (ORCPT ); Tue, 1 Jun 2021 11:00:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32920 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234066AbhFAPAW (ORCPT ); Tue, 1 Jun 2021 11:00:22 -0400 Received: from mail-wm1-x330.google.com (mail-wm1-x330.google.com [IPv6:2a00:1450:4864:20::330]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8D37DC06174A for ; Tue, 1 Jun 2021 07:58:39 -0700 (PDT) Received: by mail-wm1-x330.google.com with SMTP id h12-20020a05600c350cb029019fae7a26cdso54004wmq.5 for ; Tue, 01 Jun 2021 07:58:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:mime-version :content-transfer-encoding:fcc:to:cc; bh=K+zEts2Wvn2HaydD+45RHzTc8z4nJBbPd9CZ4dJRpC4=; b=m+8r34GiRYNTk5IL27efXSCXSJV8xNWX0lSgWukQX2IgD6IwAiDTvNor8s+aLIU186 RnLk0dz+sNQ9UPA63e13csX6Q+DuKbs6TpzOglMxO/zsTVIbpWJCOAbaow0oFrwCV9lf EIF1rcoPSb/rI7Zn0acKyUpLmBtVqhsk401ilNP81IJz+dQpMOfxuBgdS9UHD/YyWjIW 1fhUnhbY0VTZL5K7jJEsxr3xRXWY01Qb4aIFw3tGSKzU7rKuf2rhEA2a6KBE8eQtFfCx D778o6wqkrY9Xve2OxKnW+lPefVOlh3ya78eHU0MjK/YnR3Q5eMNkp+miywO3k6nKCLl 0H2w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:mime-version:content-transfer-encoding:fcc:to:cc; bh=K+zEts2Wvn2HaydD+45RHzTc8z4nJBbPd9CZ4dJRpC4=; b=NHa/2rNf6jZF4+ar0YaEXcrQdhQDaVLX/82Wtpqb8wEHf8FlAJfBLxvguGVjpk4IWT cYMe8mKz7ZJpiCTTCS8atP3ONS/IsaOxib7m9zrEnwpj2sCxfi/jFuBsyCfNJ/qRC0a0 Cls3E6HllZTmkeEgzqbw0b56IIcQV1E6z/LHmolG58tAZRn/3wv1EmiaSgcSn6w3+TR9 Zys9o2trWVZHqw1xzZE0QQiqNgXBId+NdJN/yFSNWt4zZ9tWHVEjwr75Zo4h9gfkwV1V Nbv2PgAFzckiTkbYU2XFQG+RNxkuhY4ZhkIfavW4vkTUVwOH0NGErclI52uGhVgr2F5V KKYQ== X-Gm-Message-State: AOAM533gDtDcEG0OxzyP/KORD3gLNtiGDE9gSTmdX+ToVnoIH2Tss71c AIB1KT4kBD98YQY9RQZnivHPXfwUpG8= X-Google-Smtp-Source: ABdhPJzEHWltzCSU8yCoScNKUtjT8sxzEU45SUglfZ2rFVvUknEsz1cwoeZsvV9hF4OM0MS9e3EbbQ== X-Received: by 2002:a1c:f219:: with SMTP id s25mr326584wmc.31.1622559518229; Tue, 01 Jun 2021 07:58:38 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id c15sm3467986wrd.49.2021.06.01.07.58.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 01 Jun 2021 07:58:37 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Tue, 01 Jun 2021 14:58:31 +0000 Subject: [PATCH v2 1/5] merge-ort: replace string_list_df_name_compare with faster alternative MIME-Version: 1.0 Fcc: Sent To: git@vger.kernel.org Cc: Derrick Stolee , Jonathan Tan , Taylor Blau , =?utf-8?b?UmVuw6k=?= Scharfe , Elijah Newren , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren Gathering accumulated times from trace2 output on the mega-renames testcase, I saw the following timings (where I'm only showing a few lines to highlight the portions of interest): 10.120 : label:incore_nonrecursive 4.462 : ..label:process_entries 3.143 : ....label:process_entries setup 2.988 : ......label:plist special sort 1.305 : ....label:processing 2.604 : ..label:collect_merge_info 2.018 : ..label:merge_start 1.018 : ..label:renames In the above output, note that the 4.462 seconds for process_entries was split as 3.143 seconds for "process_entries setup" and 1.305 seconds for "processing" (and a little time for other stuff removed from the highlight). Most of the "process_entries setup" time was spent on "plist special sort" which corresponds to the following code: trace2_region_enter("merge", "plist special sort", opt->repo); plist.cmp = string_list_df_name_compare; string_list_sort(&plist); trace2_region_leave("merge", "plist special sort", opt->repo); In other words, in a merge strategy that would be invoked by passing "-sort" to either rebase or merge, sorting an array takes more time than anything else. Serves me right for naming my merge strategy this way. Rewrite the comparison function in a way that does not require finding out the lengths of the strings when comparing them. While at it, tweak the code for our specific case -- no need to handle a variety of modes, for example. The combination of these changes reduced the time spent in "plist special sort" by ~25% in the mega-renames case. For the testcases mentioned in commit 557ac0350d ("merge-ort: begin performance work; instrument with trace2_region_* calls", 2020-10-28), this change improves the performance as follows: Before After no-renames: 5.622 s ± 0.059 s 5.235 s ± 0.042 s mega-renames: 10.127 s ± 0.073 s 9.419 s ± 0.107 s just-one-mega: 500.3 ms ± 3.8 ms 480.1 ms ± 3.9 ms Signed-off-by: Elijah Newren --- merge-ort.c | 72 ++++++++++++++++++++++++++++++++++++++--------------- 1 file changed, 52 insertions(+), 20 deletions(-) diff --git a/merge-ort.c b/merge-ort.c index 142d44d74d63..0fe2eaf02eb2 100644 --- a/merge-ort.c +++ b/merge-ort.c @@ -2746,31 +2746,63 @@ static int detect_and_process_renames(struct merge_options *opt, /*** Function Grouping: functions related to process_entries() ***/ -static int string_list_df_name_compare(const char *one, const char *two) +static int sort_dirs_next_to_their_children(const char *one, const char *two) { - int onelen = strlen(one); - int twolen = strlen(two); + unsigned char c1, c2; + /* - * Here we only care that entries for D/F conflicts are - * adjacent, in particular with the file of the D/F conflict - * appearing before files below the corresponding directory. - * The order of the rest of the list is irrelevant for us. + * Here we only care that entries for directories appear adjacent + * to and before files underneath the directory. We can achieve + * that by pretending to add a trailing slash to every file and + * then sorting. In other words, we do not want the natural + * sorting of + * foo + * foo.txt + * foo/bar + * Instead, we want "foo" to sort as though it were "foo/", so that + * we instead get + * foo.txt + * foo + * foo/bar + * To achieve this, we basically implement our own strcmp, except that + * if we get to the end of either string instead of comparing NUL to + * another character, we compare '/' to it. + * + * If this unusual "sort as though '/' were appended" perplexes + * you, perhaps it will help to note that this is not the final + * sort. write_tree() will sort again without the trailing slash + * magic, but just on paths immediately under a given tree. * - * To achieve this, we sort with df_name_compare and provide - * the mode S_IFDIR so that D/F conflicts will sort correctly. - * We use the mode S_IFDIR for everything else for simplicity, - * since in other cases any changes in their order due to - * sorting cause no problems for us. + * The reason to not use df_name_compare directly was that it was + * just too expensive (we don't have the string lengths handy), so + * I had to reimplement it. */ - int cmp = df_name_compare(one, onelen, S_IFDIR, - two, twolen, S_IFDIR); + /* - * Now that 'foo' and 'foo/bar' compare equal, we have to make sure - * that 'foo' comes before 'foo/bar'. + * NOTE: This function will never be called with two equal strings, + * because it is used to sort the keys of a strmap, and strmaps have + * unique keys by construction. That simplifies our c1==c2 handling + * below. */ - if (cmp) - return cmp; - return onelen - twolen; + + while (*one && (*one == *two)) { + one++; + two++; + } + + c1 = *one; + if (!c1) + c1 = '/'; + + c2 = *two; + if (!c2) + c2 = '/'; + + if (c1 == c2) { + /* Getting here means one is a leading directory of the other */ + return (*one) ? 1 : -1; + } else + return c1-c2; } static int read_oid_strbuf(struct merge_options *opt, @@ -3481,7 +3513,7 @@ static void process_entries(struct merge_options *opt, trace2_region_leave("merge", "plist copy", opt->repo); trace2_region_enter("merge", "plist special sort", opt->repo); - plist.cmp = string_list_df_name_compare; + plist.cmp = sort_dirs_next_to_their_children; string_list_sort(&plist); trace2_region_leave("merge", "plist special sort", opt->repo); From patchwork Tue Jun 1 14:58:32 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12291475 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4FFADC4708F for ; Tue, 1 Jun 2021 14:58:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2B983613A9 for ; Tue, 1 Jun 2021 14:58:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234343AbhFAPAZ (ORCPT ); Tue, 1 Jun 2021 11:00:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32922 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234238AbhFAPAW (ORCPT ); Tue, 1 Jun 2021 11:00:22 -0400 Received: from mail-wr1-x431.google.com (mail-wr1-x431.google.com [IPv6:2a00:1450:4864:20::431]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 121D4C061756 for ; Tue, 1 Jun 2021 07:58:40 -0700 (PDT) Received: by mail-wr1-x431.google.com with SMTP id c5so3500432wrq.9 for ; Tue, 01 Jun 2021 07:58:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=wMVXDPLGBMLYT6OBJ2rWFehw8Rz0HAUKh3+HU5yscHk=; b=Pkf213Lvk1aVJjQkS/96ika+FK2ZXakm6Yt2yyohBdzYbg8GZB1E7lytg6hbsGtjP/ LkGgI56Jx2DDPR50SN6gRnyJyEeX9zfNuQhAUFaAhi8mkTUa2D/GIk0r2+ZIf7Jhi02L YKvrSc+mjf3WP43R4lzxqGNyj7AVcokIKEc0pdpUBiNZzoNOeojLqwLKYNe+l9/LaNhd 8CuOdxyaQkuCTUiyNNQBnccy3mAc/WGBt7UELb+zvaG9aygU/PrQOmtxxZVKE7Qe5zps 06/QPNeRtfJ/jZVd7r4GfHir1UZI1tUqpg7NxjS053yy8YfdMI2OEmkdUY1qARWL9qMR 4uRg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=wMVXDPLGBMLYT6OBJ2rWFehw8Rz0HAUKh3+HU5yscHk=; b=CX/KF6gSgCEBRWwoV554juWjXQ84sI+JsecrHMGLalPeSkrEur6xgT4pnyFBSkFpfO +FS4KEVAFuw92OL9t0GC6ebOcecCZH9GDAOGWOfHOcWManlKngMPECSHEK1htzicZm+g uMjOIoMj2hrsegTEkiCE+ISuOAC/ZkA61QOoDdreuiB9rhH9Tp9qFoDihakVNKocycS/ aOYVWvbtsbTGsryP+fGPHkJWjwLnIGKymoROc0zlHDK4uYW3TwOkn2nn+qn5q8z7RnmD 972shF9KDGgnxZEGF9PRoWQwgqaipzWdGEPK9CxvIE7ivI72v1msFi5Ih4VEivTD6gBq SMvQ== X-Gm-Message-State: AOAM530xeO6/8FLOto5CulkDQpJ8MzMROYaSjAM2zYFW2Y8QhOiPTD3P ttL+KRgSY3L7Jh4K0T3MwDm6L13Xv3s= X-Google-Smtp-Source: ABdhPJygjIcnYpUV6FoudJbfOAcHWBw3nLyQASnlw+TLtXBpe1VThNZDrJJpSkfREO7whzLD3qZ+zg== X-Received: by 2002:adf:9385:: with SMTP id 5mr20250426wrp.53.1622559518708; Tue, 01 Jun 2021 07:58:38 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id n6sm17616861wmq.34.2021.06.01.07.58.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 01 Jun 2021 07:58:38 -0700 (PDT) Message-Id: <38713ed482736af8d1dea4d180651fd919837109.1622559516.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Tue, 01 Jun 2021 14:58:32 +0000 Subject: [PATCH v2 2/5] diffcore-rename: avoid unnecessary strdup'ing in break_idx Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Derrick Stolee , Jonathan Tan , Taylor Blau , =?utf-8?b?UmVuw6k=?= Scharfe , Elijah Newren , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren The keys of break_idx are strings from the diff_filepairs of diff_queued_diff. break_idx is only used in location_rename_dst(), and that usage is always before any free'ing of the pairs (and thus the strings in the pairs). As such, there is no need to strdup these keys; we can just reuse the existing strings as-is. The merge logic doesn't make use of break detection, so this does not affect the performance of any of my testcases. It was just a minor unrelated optimization noted in passing while looking at the code. Signed-off-by: Elijah Newren --- diffcore-rename.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/diffcore-rename.c b/diffcore-rename.c index 3375e24659ea..e333a6d64791 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -54,7 +54,7 @@ static void register_rename_src(struct diff_filepair *p) if (p->broken_pair) { if (!break_idx) { break_idx = xmalloc(sizeof(*break_idx)); - strintmap_init(break_idx, -1); + strintmap_init_with_options(break_idx, -1, NULL, 0); } strintmap_set(break_idx, p->one->path, rename_dst_nr); } From patchwork Tue Jun 1 14:58:33 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12291473 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2CEEEC47093 for ; Tue, 1 Jun 2021 14:58:44 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 092DD613A9 for ; Tue, 1 Jun 2021 14:58:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234297AbhFAPAY (ORCPT ); Tue, 1 Jun 2021 11:00:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32926 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234177AbhFAPAW (ORCPT ); Tue, 1 Jun 2021 11:00:22 -0400 Received: from mail-wr1-x42a.google.com (mail-wr1-x42a.google.com [IPv6:2a00:1450:4864:20::42a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AED09C06175F for ; Tue, 1 Jun 2021 07:58:40 -0700 (PDT) Received: by mail-wr1-x42a.google.com with SMTP id n4so14698217wrw.3 for ; Tue, 01 Jun 2021 07:58:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=BrtiK6c6E7EWL2CwKQrOLK27at71WBUrJ7plYPI5xiY=; b=Y1SqhhieNkP0AGHKmEGbkaNikvNHFPLg+ChnQ81WnJ7tg57R2QI2cBvvPmHrLK2h8E v6O4gI6jHpCczZpF4ZGpvHU4r3aQibpDeYMmJU0ez9nlILSiOZ07dBrEwHt4d3nDGo7N dd+r7E6d44QNWqlT5wiCf7p8UzJfUV3Tq1ie2uxfl6i/UIF5m2dKcKc9qqCzRid2Bjrl BAOVNE/KkaHkvp5lIJ+m1J6IiQzrNWAMW9Bf46Met+6k8WZYA8gdlZiai7kz0CUSpkvs WD2geSLqBMdaNzPpMP9dgZmfcyRMn5oweeWd4W8uRan9falvfYiAqN0Yfk6BNh5ra6IX NI+g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=BrtiK6c6E7EWL2CwKQrOLK27at71WBUrJ7plYPI5xiY=; b=WQr7/jMzi37IZu/8w0CeL0L+pzyr2QmSTXSS2RjkmusvSlEULfsANR/WEXmJ/5VDkx 5+OE2BkkEhXObcBi127+OHY3CgiQdp75kDoPtFJb9epflMbUW6+U4qq3Q98TxzRgybOi MC4QdvVTSBnHy4+1BTHu4l5UfJaZ0CNc9025m9xf/qtH7YgzPugCfu8pKVcWhjwv8/be 6gUK2UD8Fvzve9i7bFe4hQjYY20zxd2QRdnmTF8V2Bf+cMP/l7IthX9P08hx2YyR9GKg DPWmzuZW9EL0mQ6V/FdqjE/S5TjflAp19iXZXTVD7xHnkHYaQGcPX1BoxlSCNFkdk0hU yjvQ== X-Gm-Message-State: AOAM532rLjZKhh6JVSUBEr/f0V1rJw+0/AgBtKRsV/RfFkaONcBHW7qI SdPuAjriyqq/u8eTuTROMr0HLeyGKxU= X-Google-Smtp-Source: ABdhPJwhf1V71Kjl1iu61NSeKOX40CKEYIKrz71ukLXrT/BtvZC7rJKBJ0A/bL2zaRhuWCaFD4z2eA== X-Received: by 2002:a5d:638b:: with SMTP id p11mr28136327wru.90.1622559519307; Tue, 01 Jun 2021 07:58:39 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id a11sm3478289wrr.48.2021.06.01.07.58.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 01 Jun 2021 07:58:39 -0700 (PDT) Message-Id: <45e1de5fe7808f5297d5e33d14c239d74de33bdc.1622559516.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Tue, 01 Jun 2021 14:58:33 +0000 Subject: [PATCH v2 3/5] diffcore-rename: enable limiting rename detection to relevant destinations Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Derrick Stolee , Jonathan Tan , Taylor Blau , =?utf-8?b?UmVuw6k=?= Scharfe , Elijah Newren , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren Our former optimizations focused on limiting rename detection to a pre-specified set of relevant sources. This was because the merge logic only had a way of knowing which sources were relevant. However, other callers of rename detection might benefit from being able to limit rename detection to a known set of relevant destinations. In particular, a properly implemented `git log --follow` might benefit from such an ability. Since the code to implement such limiting is very similar to what we've already done, just implement it now even though we do not yet have any callers making use of this ability. Signed-off-by: Elijah Newren --- diffcore-rename.c | 48 +++++++++++++++++++++++++++++++++++++++++------ diffcore.h | 2 ++ merge-ort.c | 1 + 3 files changed, 45 insertions(+), 6 deletions(-) diff --git a/diffcore-rename.c b/diffcore-rename.c index e333a6d64791..8ff83a9f3b99 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -372,6 +372,7 @@ struct dir_rename_info { struct strmap dir_rename_guess; struct strmap *dir_rename_count; struct strintmap *relevant_source_dirs; + struct strset *relevant_destination_dirs; unsigned setup; }; @@ -491,8 +492,11 @@ static void update_dir_rename_counts(struct dir_rename_info *info, !strintmap_contains(info->relevant_source_dirs, old_dir)) break; - /* Get new_dir */ + /* Get new_dir, skip if its directory isn't relevant. */ dirname_munge(new_dir); + if (info->relevant_destination_dirs && + !strset_contains(info->relevant_destination_dirs, new_dir)) + break; /* * When renaming @@ -567,6 +571,7 @@ static void update_dir_rename_counts(struct dir_rename_info *info, static void initialize_dir_rename_info(struct dir_rename_info *info, struct strintmap *relevant_sources, + struct strset *relevant_destinations, struct strintmap *dirs_removed, struct strmap *dir_rename_count, struct strmap *cached_pairs) @@ -575,7 +580,7 @@ static void initialize_dir_rename_info(struct dir_rename_info *info, struct strmap_entry *entry; int i; - if (!dirs_removed && !relevant_sources) { + if (!dirs_removed && !relevant_sources && !relevant_destinations) { info->setup = 0; return; } @@ -589,6 +594,18 @@ static void initialize_dir_rename_info(struct dir_rename_info *info, strintmap_init_with_options(&info->idx_map, -1, NULL, 0); strmap_init_with_options(&info->dir_rename_guess, NULL, 0); + /* Setup info->relevant_destination_dirs */ + info->relevant_destination_dirs = NULL; + if (relevant_destinations) { + info->relevant_destination_dirs = xmalloc(sizeof(struct strset)); + strset_init(info->relevant_destination_dirs); + strset_for_each_entry(relevant_destinations, &iter, entry) { + char *dirname = get_dirname(entry->key); + strset_add(info->relevant_destination_dirs, dirname); + free(dirname); + } + } + /* Setup info->relevant_source_dirs */ info->relevant_source_dirs = NULL; if (dirs_removed || !relevant_sources) { @@ -700,6 +717,12 @@ static void cleanup_dir_rename_info(struct dir_rename_info *info, FREE_AND_NULL(info->relevant_source_dirs); } + /* relevant_destination_dirs */ + if (info->relevant_destination_dirs) { + strset_clear(info->relevant_destination_dirs); + FREE_AND_NULL(info->relevant_destination_dirs); + } + /* dir_rename_count */ if (!keep_dir_rename_count) { partial_clear_dir_rename_count(info->dir_rename_count); @@ -827,6 +850,7 @@ static int find_basename_matches(struct diff_options *options, int minimum_score, struct dir_rename_info *info, struct strintmap *relevant_sources, + struct strset *relevant_destinations, struct strintmap *dirs_removed) { /* @@ -949,9 +973,15 @@ static int find_basename_matches(struct diff_options *options, if (rename_dst[dst_index].is_rename) continue; /* already used previously */ - /* Estimate the similarity */ one = rename_src[src_index].p->one; two = rename_dst[dst_index].p->two; + + /* Skip irrelevant destinations */ + if (relevant_destinations && + !strset_contains(relevant_destinations, two->path)) + continue; + + /* Estimate the similarity */ score = estimate_similarity(options->repo, one, two, minimum_score, skip_unmodified); @@ -1258,6 +1288,7 @@ static void handle_early_known_dir_renames(struct dir_rename_info *info, void diffcore_rename_extended(struct diff_options *options, struct strintmap *relevant_sources, + struct strset *relevant_destinations, struct strintmap *dirs_removed, struct strmap *dir_rename_count, struct strmap *cached_pairs) @@ -1376,8 +1407,8 @@ void diffcore_rename_extended(struct diff_options *options, /* Preparation for basename-driven matching. */ trace2_region_enter("diff", "dir rename setup", options->repo); initialize_dir_rename_info(&info, relevant_sources, - dirs_removed, dir_rename_count, - cached_pairs); + relevant_destinations, dirs_removed, + dir_rename_count, cached_pairs); trace2_region_leave("diff", "dir rename setup", options->repo); /* Utilize file basenames to quickly find renames. */ @@ -1386,6 +1417,7 @@ void diffcore_rename_extended(struct diff_options *options, min_basename_score, &info, relevant_sources, + relevant_destinations, dirs_removed); trace2_region_leave("diff", "basename matches", options->repo); @@ -1441,6 +1473,10 @@ void diffcore_rename_extended(struct diff_options *options, if (rename_dst[i].is_rename) continue; /* exact or basename match already handled */ + if (relevant_destinations && + !strset_contains(relevant_destinations, two->path)) + continue; + m = &mx[dst_cnt * NUM_CANDIDATE_PER_DST]; for (j = 0; j < NUM_CANDIDATE_PER_DST; j++) m[j].dst = -1; @@ -1574,5 +1610,5 @@ void diffcore_rename_extended(struct diff_options *options, void diffcore_rename(struct diff_options *options) { - diffcore_rename_extended(options, NULL, NULL, NULL, NULL); + diffcore_rename_extended(options, NULL, NULL, NULL, NULL, NULL); } diff --git a/diffcore.h b/diffcore.h index 533b30e21e7f..435c7094f403 100644 --- a/diffcore.h +++ b/diffcore.h @@ -10,6 +10,7 @@ struct diff_options; struct repository; struct strintmap; struct strmap; +struct strset; struct userdiff_driver; /* This header file is internal between diff.c and its diff transformers @@ -180,6 +181,7 @@ void diffcore_break(struct repository *, int); void diffcore_rename(struct diff_options *); void diffcore_rename_extended(struct diff_options *options, struct strintmap *relevant_sources, + struct strset *relevant_destinations, struct strintmap *dirs_removed, struct strmap *dir_rename_count, struct strmap *cached_pairs); diff --git a/merge-ort.c b/merge-ort.c index 0fe2eaf02eb2..28951c35ddc2 100644 --- a/merge-ort.c +++ b/merge-ort.c @@ -2568,6 +2568,7 @@ static void detect_regular_renames(struct merge_options *opt, trace2_region_enter("diff", "diffcore_rename", opt->repo); diffcore_rename_extended(&diff_opts, &renames->relevant_sources[side_index], + NULL, &renames->dirs_removed[side_index], &renames->dir_rename_count[side_index], &renames->cached_pairs[side_index]); From patchwork Tue Jun 1 14:58:34 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12291479 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EF895C47092 for ; Tue, 1 Jun 2021 14:58:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CBDAC613A9 for ; Tue, 1 Jun 2021 14:58:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234353AbhFAPA0 (ORCPT ); Tue, 1 Jun 2021 11:00:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32926 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234210AbhFAPAW (ORCPT ); Tue, 1 Jun 2021 11:00:22 -0400 Received: from mail-wr1-x42b.google.com (mail-wr1-x42b.google.com [IPv6:2a00:1450:4864:20::42b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2E345C061574 for ; Tue, 1 Jun 2021 07:58:41 -0700 (PDT) Received: by mail-wr1-x42b.google.com with SMTP id z8so9563505wrp.12 for ; Tue, 01 Jun 2021 07:58:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=LODumI4lWebuNovaYpML51dqNMvU7NKt7WTC3uOrxLw=; b=JnHzWbHNooNnFMTpngm6gu0B8nMYmqYbIozYnX1QYtuQCU7vkE88hVlmwtS4b0rfhV p9BfMbcTUTUZiWk9aS/Elr8J/6Ke8JEoC8422mDuhzUt1I/mgQOjq54pfw865IJ21/6w l3Q8eVH7wQIp/hVa1ium4EgQvsb9/Laf5TlWn6pfTfi5qrL62Pq+eN6lW9zm0N7ecBVz XG5vnCUbHE3aAzokLzzgrIsEa6iTzCcr7AGe6xMpJvMEjULa7L8XbywzTSm50kBpwafv sqxuyMwuJu5j4HhuYsqeIx7iki+6G0KbM0c7Yteiyu4l6ycybRnxTV7DxzaFaNzCaY/y Hsyg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=LODumI4lWebuNovaYpML51dqNMvU7NKt7WTC3uOrxLw=; b=ISke0EZw+uvg3JdonH58nZ/aK+EJKJEvnMJ4uMX4HEJk2l/1Mg5hT3txbrQL4WUVHe +EJNuCslFbXsAc7H94g/d1kPZqlazW8nZWKC6etbo0IFeVVRqIyJiK78DQ+DnLaM9+BQ o7sOeWLyqbbEb6QzhaUkh7UlONKOj7pRPqrLAW/ogRU9kGrEKlhxx7jcS1HWcvlneadu c+xMawSAXV95NdRyulgP+455jwLTCmc46jpnIaD9KClJ99Zv3i+7p6+RgHMfK+qCG6uD 3R44o4rirrfyNEAtaFRU8j6nQhMk/CmuJ3crVofMsC+HF9yXanS1xzV8a/NQcMxZWnuZ 5dOw== X-Gm-Message-State: AOAM533votFSfg5mf9FLYJC9L0nELOqgg81sLX1WpB5SB1Xrcnvrw1Vp tbJtgljavJ5FHNHmABdlv5H0W9nmT1U= X-Google-Smtp-Source: ABdhPJyIkOKFVaLZDYFqiinUTc0DFc/pRNuOdCNcOUz7+3oTOgvKEFYrmjF0JsC6MnPWgBvNJ+CcVQ== X-Received: by 2002:adf:ed03:: with SMTP id a3mr5752322wro.166.1622559519819; Tue, 01 Jun 2021 07:58:39 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id l3sm18662277wmh.2.2021.06.01.07.58.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 01 Jun 2021 07:58:39 -0700 (PDT) Message-Id: <2f26d7e935c08ae792c5c8ae00a1f897fc5b1a5a.1622559516.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Tue, 01 Jun 2021 14:58:34 +0000 Subject: [PATCH v2 4/5] Fix various issues found in comments Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Derrick Stolee , Jonathan Tan , Taylor Blau , =?utf-8?b?UmVuw6k=?= Scharfe , Elijah Newren , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren A random hodge-podge of incorrect or out-of-date comments that I found: * t6423 had a comment that has referred to the wrong test for years; fix it to refer to the right one. * diffcore-rename had a FIXME comment meant to remind myself to investigate if I could make another code change. I later investigated and removed the FIXME, but while cherry-picking the patch to submit upstream I missed the later update. Remove the comment now. * merge-ort had the early part of a comment for a function; I had meant to include the more involved description when I updated the function. Update the comment now. Signed-off-by: Elijah Newren --- diffcore-rename.c | 2 +- merge-ort.c | 8 +++++--- t/t6423-merge-rename-directories.sh | 2 +- 3 files changed, 7 insertions(+), 5 deletions(-) diff --git a/diffcore-rename.c b/diffcore-rename.c index 8ff83a9f3b99..c9f7d59cf62b 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -1579,7 +1579,7 @@ void diffcore_rename_extended(struct diff_options *options, /* all the usual ones need to be kept */ diff_q(&outq, p); else - /* no need to keep unmodified pairs; FIXME: remove earlier? */ + /* no need to keep unmodified pairs */ pair_to_free = p; if (pair_to_free) diff --git a/merge-ort.c b/merge-ort.c index 28951c35ddc2..14b3e14bbe47 100644 --- a/merge-ort.c +++ b/merge-ort.c @@ -2533,7 +2533,7 @@ static int compare_pairs(const void *a_, const void *b_) return strcmp(a->one->path, b->one->path); } -/* Call diffcore_rename() to compute which files have changed on given side */ +/* Call diffcore_rename() to update deleted/added pairs into rename pairs */ static void detect_regular_renames(struct merge_options *opt, unsigned side_index) { @@ -2587,8 +2587,10 @@ static void detect_regular_renames(struct merge_options *opt, } /* - * Get information of all renames which occurred in 'side_pairs', discarding - * non-renames. + * Get information of all renames which occurred in 'side_pairs', making use + * of any implicit directory renames in side_dir_renames (also making use of + * implicit directory renames rename_exclusions as needed by + * check_for_directory_rename()). Add all (updated) renames into result. */ static int collect_renames(struct merge_options *opt, struct diff_queue_struct *result, diff --git a/t/t6423-merge-rename-directories.sh b/t/t6423-merge-rename-directories.sh index be84d22419d9..e834b7e6efe0 100755 --- a/t/t6423-merge-rename-directories.sh +++ b/t/t6423-merge-rename-directories.sh @@ -454,7 +454,7 @@ test_expect_success '1f: Split a directory into two other directories' ' # the directory renamed, but the files within it. (see 1b) # # If renames split a directory into two or more others, the directory -# with the most renames, "wins" (see 1c). However, see the testcases +# with the most renames, "wins" (see 1f). However, see the testcases # in section 2, plus testcases 3a and 4a. ########################################################################### From patchwork Tue Jun 1 14:58:35 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12291477 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B7228C47093 for ; Tue, 1 Jun 2021 14:58:46 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 968416128A for ; Tue, 1 Jun 2021 14:58:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234363AbhFAPA1 (ORCPT ); Tue, 1 Jun 2021 11:00:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32932 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234328AbhFAPAY (ORCPT ); Tue, 1 Jun 2021 11:00:24 -0400 Received: from mail-wr1-x434.google.com (mail-wr1-x434.google.com [IPv6:2a00:1450:4864:20::434]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C71CEC06174A for ; Tue, 1 Jun 2021 07:58:41 -0700 (PDT) Received: by mail-wr1-x434.google.com with SMTP id c3so14654848wrp.8 for ; Tue, 01 Jun 2021 07:58:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=MxkjSxNvX20Vsj7SsSwYxyvAkMnX7kw4YWWOD/9s7zg=; b=T7qhUoF07n130PRcalV4dMkfMSbwhU+MHhqqKyqtBjU8Zu4tqTfUXWgVKXSG10rQlb pVyKgKsmuzXHv8z+KLJM7fW7LW87B2QNEr+otjZXipiN4XOTUgRzP9ajafS2BzkKqBiE 6PTWLX/t3RMtmQtI2V3XsOjYaz8OuRf7NwjJiEMfgnE04PfCR2forpVn6w3nc97cK91w ee34mrvBChfkLosYlR1wHXpSRi5TfkyQFyK4wO+zjldpA4HvuUQOTV7S707ELnHc9SLt 02oMvKrydOD/wvULEUPMPF2cTX1B70AThkVshMddiMbFkaGciOxh299id7uDGht13wZH f9Sg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=MxkjSxNvX20Vsj7SsSwYxyvAkMnX7kw4YWWOD/9s7zg=; b=E2VOdf4KR5fi4XK0MB6fWSa/k9Yw903FohewfRd+R6v19qkVFUArhPDqSxrTe3oJjU 7j9rQKkuGdDT/cr0dvRUzqGK9A+mkSvLDgm30P3GY2Dqw9nDf6ELYyp4cL5EhyMK/lae d+beKZfLFFkVE79qMkCN7bK4sUfaKr1v9vhshpeid4uWaPnvBsy6HIJuPekd6u58eM6c rTHp+Rccs1n+rx08U6bEhn3ev3vT/AG8ycRvkNdMFZcdZ56k5hoU0YuSFoaXfd7gAigC sKycmM8ShM68fnkGVXZdmkMhiQiMycjL/qgKgkAzkYBF9Jb4ma24XQ/6lj8y4TA5x2YR aG6A== X-Gm-Message-State: AOAM530VS3T2Pv+c53pPaQR4AqDE150lgaMXLgfpJHwXX+srWupdO91+ z1p7X/OYby4YgBF2FFkUyAf8e0bhpaU= X-Google-Smtp-Source: ABdhPJyurH/OLWjc482fQ277BciHwtmG1/76sde3S4mQ0UA8Q99LSmS97SuMXq920CoI/3YJSsLl9g== X-Received: by 2002:adf:f448:: with SMTP id f8mr12507450wrp.320.1622559520457; Tue, 01 Jun 2021 07:58:40 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id h9sm18528483wmb.35.2021.06.01.07.58.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 01 Jun 2021 07:58:40 -0700 (PDT) Message-Id: <7156f26ab29919344c929be540d404dce2d5fe50.1622559516.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Tue, 01 Jun 2021 14:58:35 +0000 Subject: [PATCH v2 5/5] merge-ort: miscellaneous touch-ups Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Derrick Stolee , Jonathan Tan , Taylor Blau , =?utf-8?b?UmVuw6k=?= Scharfe , Elijah Newren , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren Add some notes in the code about invariants with match_mask when adding pairs. Also add a comment that seems to have been left out in my work of pushing these changes upstream. Signed-off-by: Elijah Newren --- merge-ort.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/merge-ort.c b/merge-ort.c index 14b3e14bbe47..d428b73dedd4 100644 --- a/merge-ort.c +++ b/merge-ort.c @@ -765,6 +765,7 @@ static void add_pair(struct merge_options *opt, int names_idx = is_add ? side : 0; if (is_add) { + assert(match_mask == 0 || match_mask == 6); if (strset_contains(&renames->cached_target_names[side], pathname)) return; @@ -772,6 +773,8 @@ static void add_pair(struct merge_options *opt, unsigned content_relevant = (match_mask == 0); unsigned location_relevant = (dir_rename_mask == 0x07); + assert(match_mask == 0 || match_mask == 3 || match_mask == 5); + /* * If pathname is found in cached_irrelevant[side] due to * previous pick but for this commit content is relevant, @@ -3483,6 +3486,8 @@ static void process_entry(struct merge_options *opt, */ if (!ci->merged.clean) strmap_put(&opt->priv->conflicted, path, ci); + + /* Record metadata for ci->merged in dir_metadata */ record_entry_for_tree(dir_metadata, path, &ci->merged); }