From patchwork Wed Mar 24 21:32:27 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12162427 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8CB3EC433E2 for ; Wed, 24 Mar 2021 21:33:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3D40F61A24 for ; Wed, 24 Mar 2021 21:33:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238476AbhCXVcn (ORCPT ); Wed, 24 Mar 2021 17:32:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57430 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238490AbhCXVci (ORCPT ); Wed, 24 Mar 2021 17:32:38 -0400 Received: from mail-wr1-x435.google.com (mail-wr1-x435.google.com [IPv6:2a00:1450:4864:20::435]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9D69AC06174A for ; Wed, 24 Mar 2021 14:32:37 -0700 (PDT) Received: by mail-wr1-x435.google.com with SMTP id x16so249910wrn.4 for ; Wed, 24 Mar 2021 14:32:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=2rWkcFDLnOgbEh3iq32tzYD1sEh6ZH0IimGam4RIGZE=; b=DsQpXNDAYhCbfq/KaApdsa77mp/GStDHCAPeARt8hzzNo6roX4dywzpI72oFS1bCxA Jyos7Q07Lpybuz1shjSm22XjxbnNv1MBZpAC7r6KPEPeD+lSwymctYgHjjbKrzoeZ1w3 HFSes+8QMoDh2elEJZHJxkM+sCKTGLRCrVg3XSlNACAYMGv/ruPBdRyBdX6/QPN5OzEQ xg76PVHC/cTJ12u8QE/5ezQ7ZadDYXdmSTipuXa2+zGMaghUE81o+lYiG6CB+zg46DwT nTJsWbQE3B2BuhLVM38ElqWxTBb/rpkMCH0ITUMQibDjJ2UNzwIZ9PuWsa5/JZdN+RGD 4mvA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=2rWkcFDLnOgbEh3iq32tzYD1sEh6ZH0IimGam4RIGZE=; b=HLEbtSogf2eOzIkYU1JLmAEqL2P9dDGoZlY3AtPU3Y3NtgdXceNoqlceB3iIN4hcsQ +M4YBQPyb25z8z1GHtWtX7TCs3xZbEHOkKazPYH6QDTZ7JiG9jZBn9dRt5lvNBcYJqqb X66H1cqngOxCkw6trdLRWUQDTVSxX08aWNhEtCdwo54GaovKjc/cJYIaGRQWxPtxHTTj MO5YknHsyz8IlLjEjOsCtgYCmk/+Nbg8v/KuxIZAp9l3PzfgyVf38F8Jh/rv0pBP08Q0 6pxGhaKSZkcOc68ix8Q+wOtJkNhtq/a+i4ibwPfE7M3DtTlU8xbTBNICHB77t0b6Rb4h 5GEA== X-Gm-Message-State: AOAM532ToM/0jCXeFpSCXQfsLC+ruLPmQFQVZORCSvw7WDC8PkGsbmBE 01M9CK81mwKXxvBfSQBnX1ELdT5b7H8= X-Google-Smtp-Source: ABdhPJz7zmGI8HBTKBcM/7sSUF9igy3fik9SYVxpfPS8wc5gVIqVHW8WQKmh/08IlrsvT5/YON4B0A== X-Received: by 2002:a5d:58c9:: with SMTP id o9mr5545465wrf.181.1616621556322; Wed, 24 Mar 2021 14:32:36 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id s3sm3769431wmd.21.2021.03.24.14.32.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 24 Mar 2021 14:32:35 -0700 (PDT) Message-Id: <689a7de56483fd9dbd87ad9cc358ccf671ccc1a7.1616621553.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Wed, 24 Mar 2021 21:32:27 +0000 Subject: [PATCH 1/7] merge-ort: add data structures for in-memory caching of rename detection Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Derrick Stolee , Jonathan Tan , Taylor Blau , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren When there are many renames between the old base of a series of commits and the new base for a series of commits, the sequence of merges employed to transplant those commits (from a cherry-pick or rebase operation) will repeatedly detect the exact same renames. This is wasted effort. Add data structures which will be used to cache rename detection results, along with the initialization and deallocation of these data structures. Future commits will populate these caches, detect the appropriate circumstances when they can be used, and employ them to avoid re-detecting the same renames repeatedly. Signed-off-by: Elijah Newren --- merge-ort.c | 42 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 42 insertions(+) diff --git a/merge-ort.c b/merge-ort.c index 8258d3fd621e..0774152ea64a 100644 --- a/merge-ort.c +++ b/merge-ort.c @@ -139,6 +139,37 @@ struct rename_info { int callback_data_nr, callback_data_alloc; char *callback_data_traverse_path; + /* + * cached_pairs: Caching of renames and deletions. + * + * These are mappings recording renames and deletions of individual + * files (not directories). They are thus a map from an old + * filename to either NULL (for deletions) or a new filename (for + * renames). + */ + struct strmap cached_pairs[3]; + + /* + * cached_target_names: just the destinations from cached_pairs + * + * We sometimes want a fast lookup to determine if a given filename + * is one of the destinations in cached_pairs. cached_target_names + * is thus duplicative information, but it provides a fast lookup. + */ + struct strset cached_target_names[3]; + + /* + * cached_irrelevant: Caching of rename_sources that aren't relevant. + * + * cached_pairs records both renames and deletes. Sometimes we + * do not know if a path is a rename or a delete because we pass + * RELEVANT_LOCATION to diffcore_rename_extended() and based on + * various optimizations it returns without detecting whether that + * path is actually a rename or a delete. We need to cache such + * paths too, but separately from cached_pairs. + */ + struct strset cached_irrelevant[3]; + /* * needed_limit: value needed for inexact rename detection to run * @@ -381,6 +412,8 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti, reinitialize ? strmap_partial_clear : strmap_clear; void (*strintmap_func)(struct strintmap *) = reinitialize ? strintmap_partial_clear : strintmap_clear; + void (*strset_func)(struct strset *) = + reinitialize ? strset_partial_clear : strset_clear; /* * We marked opti->paths with strdup_strings = 0, so that we @@ -424,6 +457,9 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti, strmap_func(&renames->dir_renames[i], 0); strintmap_func(&renames->relevant_sources[i]); + strset_func(&renames->cached_target_names[i]); + strmap_func(&renames->cached_pairs[i], 1); + strset_func(&renames->cached_irrelevant[i]); } if (!reinitialize) { @@ -3675,6 +3711,12 @@ static void merge_start(struct merge_options *opt, struct merge_result *result) NULL, 0); strintmap_init_with_options(&renames->relevant_sources[i], 0, NULL, 0); + strmap_init_with_options(&renames->cached_pairs[i], + NULL, 1); + strset_init_with_options(&renames->cached_irrelevant[i], + NULL, 1); + strset_init_with_options(&renames->cached_target_names[i], + NULL, 0); } /* From patchwork Wed Mar 24 21:32:28 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12162425 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0FD80C433DB for ; Wed, 24 Mar 2021 21:33:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C723061A1F for ; Wed, 24 Mar 2021 21:33:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238474AbhCXVcn (ORCPT ); Wed, 24 Mar 2021 17:32:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57432 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238491AbhCXVci (ORCPT ); Wed, 24 Mar 2021 17:32:38 -0400 Received: from mail-wm1-x332.google.com (mail-wm1-x332.google.com [IPv6:2a00:1450:4864:20::332]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3146BC061763 for ; Wed, 24 Mar 2021 14:32:38 -0700 (PDT) Received: by mail-wm1-x332.google.com with SMTP id n11-20020a05600c4f8bb029010e5cf86347so3403202wmq.1 for ; Wed, 24 Mar 2021 14:32:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=7nRedVQaoNiBIqDox5yv7uQyk1IOzoWKCIVl/e4A2oo=; b=Naokp0511WwvonJZeH4ujKsAALYPawDa46mLn3Wg9Z18A1YLh2vOfacLA0z+BKyibu y2S4nf3ZfZwPYO3EjU0S1wox6SZXzREeRblIGutqKOJ8TUfPLrPETsfKNJAAP8DvCl1r MxuCnPFERbHWDT6slYD83YCXvcCTsqGTZLSesjbKY/JpXAJlQNylRVeGJgR65Zouf22p jn2PY/2xGE+5poqw0eUM/AlIbGpPcxKouL26YXsBFfDJZDAGlMFIYYZeyu0t6MCCSbXH 6R3yrMmH3FtaS7eOPGvkx/nkS/QZ3j2bi3LojfJMktc6F4t23zUqZayf3O9G/vjAuCLA NUzw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=7nRedVQaoNiBIqDox5yv7uQyk1IOzoWKCIVl/e4A2oo=; b=o/nxxKdnju32/tqEDJ9THuBN0FVt6NCg06hOzc1N2m6pq9dF7n2Yon+spdkox74/5/ OPx7jSaZ8pRxOv7HCbdUbbabLLKU7WJaNwEzg8XnA31Dw799nrDAOFpvMDn8DmkQdRWr FUBhcelhpq5cN3Du3IGbid2emzl1ewaiTgRaDT6gINDQlZBACMG2EmjBNEXJYZB9x5gG BSbkBpQsHUN9iHXI9oYtK3XzdLIU15yXYn8XSxqQBer6ihudS0KaaJj8u0XEL20jUnIy MYzRz8tuqA+ebDYsbrL8a5ia16Tdc09AhwVirWDsH/qY6Meod4VHTLCZPYhLMXZW2ZdR p1vw== X-Gm-Message-State: AOAM530bx7UXz7aE09f6SxcvpfEpMMp4YJNK3VrHXhBKD5EcNhn0bj3q taC+K+Lk5HTRsNGml4CfXpk8kdvNPZg= X-Google-Smtp-Source: ABdhPJzrFut4ecnEqtqn/bTiTumTgv6LU9VvTrtNUxzukBbs7n64Y43AR5kN/SJWU2F/YARfEw4NDQ== X-Received: by 2002:a05:600c:3647:: with SMTP id y7mr4808743wmq.17.1616621556883; Wed, 24 Mar 2021 14:32:36 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id e8sm3762716wme.14.2021.03.24.14.32.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 24 Mar 2021 14:32:36 -0700 (PDT) Message-Id: <5f34332c96053f28eeae0b7ba43e4c65c7896899.1616621553.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Wed, 24 Mar 2021 21:32:28 +0000 Subject: [PATCH 2/7] merge-ort: populate caches of rename detection results Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Derrick Stolee , Jonathan Tan , Taylor Blau , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren Fill in cache_pairs, cached_target_names, and cached_irrelevant based on rename detection results. Future commits will make use of these values. Signed-off-by: Elijah Newren --- merge-ort.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 52 insertions(+), 2 deletions(-) diff --git a/merge-ort.c b/merge-ort.c index 0774152ea64a..2303d88e6a92 100644 --- a/merge-ort.c +++ b/merge-ort.c @@ -2333,6 +2333,47 @@ static void resolve_diffpair_statuses(struct diff_queue_struct *q) } } +static void possibly_cache_new_pair(struct rename_info *renames, + struct diff_filepair *p, + unsigned side, + char *new_path) +{ + char *old_value; + + if (!new_path) { + int val = strintmap_get(&renames->relevant_sources[side], + p->one->path); + if (val == 0) { + assert(p->status == 'D'); + strset_add(&renames->cached_irrelevant[side], + p->one->path); + } + if (val <= 0) + return; + } + if (p->status == 'D') { + /* + * If we already had this delete, we'll just set it's value + * to NULL again, so no harm. + */ + strmap_put(&renames->cached_pairs[side], p->one->path, NULL); + } else if (p->status == 'R') { + if (!new_path) + new_path = p->two->path; + new_path = xstrdup(new_path); + old_value = strmap_put(&renames->cached_pairs[side], + p->one->path, new_path); + strset_add(&renames->cached_target_names[side], new_path); + free(old_value); + } else if (p->status == 'A' && new_path) { + new_path = xstrdup(new_path); + old_value = strmap_put(&renames->cached_pairs[side], + p->two->path, new_path); + strset_add(&renames->cached_target_names[side], new_path); + assert(!old_value); + } +} + static int compare_pairs(const void *a_, const void *b_) { const struct diff_filepair *a = *((const struct diff_filepair **)a_); @@ -2414,6 +2455,7 @@ static int collect_renames(struct merge_options *opt, struct diff_filepair *p = side_pairs->queue[i]; char *new_path; /* non-NULL only with directory renames */ + possibly_cache_new_pair(renames, p, side_index, NULL); if (p->status != 'A' && p->status != 'R') { diff_free_filepair(p); continue; @@ -2430,7 +2472,7 @@ static int collect_renames(struct merge_options *opt, diff_free_filepair(p); continue; } - + possibly_cache_new_pair(renames, p, side_index, new_path); if (new_path) apply_directory_rename_modifications(opt, p, new_path); @@ -3709,8 +3751,16 @@ static void merge_start(struct merge_options *opt, struct merge_result *result) NULL, 1); strmap_init_with_options(&renames->dir_renames[i], NULL, 0); + /* + * relevant_sources uses -1 for the default, because we need + * to be able to distinguish not-in-strintmap from valid + * relevant_source values from enum file_rename_relevance. + * In particular, possibly_cache_new_pair() expects a negative + * value for not-found entries. + */ strintmap_init_with_options(&renames->relevant_sources[i], - 0, NULL, 0); + -1 /* explicitly invalid */, + NULL, 0); strmap_init_with_options(&renames->cached_pairs[i], NULL, 1); strset_init_with_options(&renames->cached_irrelevant[i], From patchwork Wed Mar 24 21:32:29 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12162431 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 93506C433E4 for ; Wed, 24 Mar 2021 21:33:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6274861A21 for ; Wed, 24 Mar 2021 21:33:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238480AbhCXVco (ORCPT ); Wed, 24 Mar 2021 17:32:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57434 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238492AbhCXVcj (ORCPT ); Wed, 24 Mar 2021 17:32:39 -0400 Received: from mail-wr1-x42b.google.com (mail-wr1-x42b.google.com [IPv6:2a00:1450:4864:20::42b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C36D3C06174A for ; Wed, 24 Mar 2021 14:32:38 -0700 (PDT) Received: by mail-wr1-x42b.google.com with SMTP id v11so236836wro.7 for ; Wed, 24 Mar 2021 14:32:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=VLCX60xDN95R/hkisFQZbB0IPW28fZj3AYxdURJZx0A=; b=ZghYytPJCxVJ1/yJ6O0ThMiacFfLRV00kXzDBwsgyCWtqr74ZoGOlsPmTrTMscBPah Kft+yxYYPRWLxwa2rL74nHtcvmRBrhSUcm5T5JUQl3q7u6vaIIM7OhRVuFKmUBn4rVmz g5QK3iDpMdrc13xUoQmiT4oti4giQFhm9p3BuWve/t/RQsVY1sjal23UF+YzcBZOkxUi JRLO8MFmr5Cn3OAYAjxqsZhcykHT/8v9j+i4bbwI1ebuLtbjDRG/Km1q0bVrmz5Znl2+ sjvxWihcS33EeltmWO5LPXBxTFCF5YD2wMtN0fo+mSCjCS9KzpsnnZsZWmsQjVuL6CtY a0Jw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=VLCX60xDN95R/hkisFQZbB0IPW28fZj3AYxdURJZx0A=; b=mGsn33UlHmm4CsI11yyqc+GWUFb+6DODmrlAQQn81IvkguPZ6VVuFGbePG/zxzKAfH gXO2y3ZqiKjtg0BJlibqgCIag9t83g5yBSyEPr7KV2sogbpPRv3RwN0ghcCM2aFnkmBr Lo9FEFgvQ1nuxdc6AzASZy59t6wEW3Jxbgq2ZdFkRqCPFc+/MIFZinv6N6gOfTwabn39 1Lr/c2uX8u2n7JZqutWNu0H7vkvW1bM9o+5q78E3h+F6CcEK1ewzb8I2cA59162BvszM BsLhu4l25Me3LCRnUYyLrp/jej8FXWK7pSodML9YCtFGzUjaDYAUuDnXmnd2JytVGJN3 Js1g== X-Gm-Message-State: AOAM530I1rAQY6ozVGrLObEiSVkD7T7etu6LJWEki25hmHZWQiT+G2Vb 7FieQbQetEig6QWPPd4zZgZDFxKGfCI= X-Google-Smtp-Source: ABdhPJxwMGJnBjYF/NwivnGDBpydUmkDa7qu7bq2ObKA9bi+WCMVzq5mLqeX+6hKpcNJWWn1fN58pQ== X-Received: by 2002:a5d:4904:: with SMTP id x4mr5440904wrq.69.1616621557509; Wed, 24 Mar 2021 14:32:37 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id a17sm3711509wmj.9.2021.03.24.14.32.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 24 Mar 2021 14:32:37 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Wed, 24 Mar 2021 21:32:29 +0000 Subject: [PATCH 3/7] merge-ort: add code to check for whether cached renames can be reused Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Derrick Stolee , Jonathan Tan , Taylor Blau , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren We need to know when renames detected in a previous merge operation can be reused in a later merge operation. Consider the following setup (from the git-rebase manpage): A---B---C topic / D---E---F---G master After rebasing, this will appear as: A'--B'--C' topic / D---E---F---G master Further, let's say that 'oldfile' was renamed to 'newfile' between E and G. The rebase or cherry-pick of A onto G will involve a three-way merge between E (as the merge base) and G and A. After detecting the rename between E:oldfile and G:newfile, there will be a three-way content merge of the following: E:oldfile G:newfile A:oldfile and produce a new result: A':newfile Now, when we want to pick B onto A', we will need to do a three-way merge between A (as the merge-base) and A' and B. This will involve a three-way content merge of A:oldfile A':newfile B:oldfile but only if we can detect that A:oldfile is similar enough to A':newfile to be used together in a three-way content merge, i.e. only if we can detect that A:oldfile and A':newfile are a rename. But we already know that A:oldfile and A':newfile are similar enough to be used in a three-way content merge, because that is precisely where A':newfile came from in the previous merge. Note that A & A' both appear in both merges. That gives us the condition under which we can reuse renames. There are a couple important points about this optimization: - If the rebase or cherry-pick halts for user conflicts, these caches are NOT saved anywhere. Thus, resuming a halted rebase or cherry-pick will result in no reused renames for the next commit. This is intentional, as user resolution can change files significantly and in ways that violate the similarity assumptions here. - Technically, in a *very* narrow case this might give slightly different results for rename detection. Using the example above, if: * E:oldfile had 20 lines * G:newfile added 10 new lines at the beginning of the file * A:oldfile deleted all but the first three lines of the file then => A':newfile would have 13 lines, 3 of which matches those in A:oldfile. Consider the two cases: * Without this optimization: - the next step of the rebase operation (moving B to B') would not detect the rename betwen A:oldfile and A':newfile - we'd thus get a modify/delete conflict with the rebase operation halting for the user to resolve, and have both A':newfile and B:oldfile sitting in the working tree. * With this optimization: - the rename between A:oldfile and A':newfile would be detected via the cache of renames - a three-way merge between A:oldfile, A':newfile, and B:oldfile would commence and be written to A':newfile Now, is the difference in behavior a bug...or a bugfix? I can't tell. Given that A:oldfile and A':newfile are not very similar, when we three-way merge with B:oldfile it seems likely we'll hit a conflict for the user to resolve. And it shouldn't be too hard for users to see why we did that three-way merge; oldfile and newfile *were* renames somewhere in the sequence. So, most of these corner cases will still behave similarly -- namely, a conflict given to the user to resolve. Also, consider the interesting case when commit B is a clean revert of commit A. Without this optimization, a rebase could not both apply a weird patch like A and then immediately revert it; users would be forced to resolve merge conflicts. With this optimization, it would successfully apply the clean revert. So, there is certainly at least one case that behaves better. Even if it's considered a "difference in behavior", I think both behaviors are reasonable, and the time savings provided by this optimization justify using the slightly altered rename heuristics. Signed-off-by: Elijah Newren --- merge-ort.c | 66 +++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 64 insertions(+), 2 deletions(-) diff --git a/merge-ort.c b/merge-ort.c index 2303d88e6a92..bb47fa91a339 100644 --- a/merge-ort.c +++ b/merge-ort.c @@ -139,6 +139,30 @@ struct rename_info { int callback_data_nr, callback_data_alloc; char *callback_data_traverse_path; + /* + * merge_trees: trees passed to the merge algorithm for the merge + * + * merge_trees records the trees passed to the merge algorithm. But, + * this data also is stored in merge_result->priv. If a sequence of + * merges are being done (such as when cherry-picking or rebasing), + * the next merge can look at this and re-use information from + * previous merges under certain cirumstances. + * + * See also all the cached_* variables. + */ + struct tree *merge_trees[3]; + + /* + * cached_pairs_valid_side: which side's cached info can be reused + * + * See the description for merge_trees. For repeated merges, at most + * only one side's cached information can be used. Valid values: + * MERGE_SIDE2: cached data from side2 can be reused + * MERGE_SIDE1: cached data from side1 can be reused + * 0: no cached data can be reused + */ + int cached_pairs_valid_side; + /* * cached_pairs: Caching of renames and deletions. * @@ -461,6 +485,8 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti, strmap_func(&renames->cached_pairs[i], 1); strset_func(&renames->cached_irrelevant[i]); } + renames->cached_pairs_valid_side = 0; + renames->dir_rename_mask = 0; if (!reinitialize) { struct hashmap_iter iter; @@ -483,8 +509,6 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti, strmap_clear(&opti->output, 0); } - renames->dir_rename_mask = 0; - /* Clean out callback_data as well. */ FREE_AND_NULL(renames->callback_data); renames->callback_data_nr = renames->callback_data_alloc = 0; @@ -3792,6 +3816,35 @@ static void merge_start(struct merge_options *opt, struct merge_result *result) trace2_region_leave("merge", "allocate/init", opt->repo); } +static void merge_check_renames_reusable(struct merge_options *opt, + struct merge_result *result, + struct tree *merge_base, + struct tree *side1, + struct tree *side2) +{ + struct rename_info *renames; + struct tree **merge_trees; + struct merge_options_internal *opti = result->priv; + + if (!opti) + return; + + renames = &opti->renames; + merge_trees = renames->merge_trees; + /* merge_trees[0..2] will only be NULL if opti is */ + assert(merge_trees[0] && merge_trees[1] && merge_trees[2]); + + /* Check if we meet a condition for re-using cached_pairs */ + if ( oideq(&merge_base->object.oid, &merge_trees[2]->object.oid) && + oideq( &side1->object.oid, &result->tree->object.oid)) + renames->cached_pairs_valid_side = MERGE_SIDE1; + else if (oideq(&merge_base->object.oid, &merge_trees[1]->object.oid) && + oideq( &side2->object.oid, &result->tree->object.oid)) + renames->cached_pairs_valid_side = MERGE_SIDE2; + else + renames->cached_pairs_valid_side = 0; /* neither side valid */ +} + /*** Function Grouping: merge_incore_*() and their internal variants ***/ /* @@ -3939,7 +3992,16 @@ void merge_incore_nonrecursive(struct merge_options *opt, trace2_region_enter("merge", "merge_start", opt->repo); assert(opt->ancestor != NULL); + merge_check_renames_reusable(opt, result, merge_base, side1, side2); merge_start(opt, result); + /* + * Record the trees used in this merge, so if there's a next merge in + * a cherry-pick or rebase sequence it might be able to take advantage + * of the cached_pairs in that next merge. + */ + opt->priv->renames.merge_trees[0] = merge_base; + opt->priv->renames.merge_trees[1] = side1; + opt->priv->renames.merge_trees[2] = side2; trace2_region_leave("merge", "merge_start", opt->repo); merge_ort_nonrecursive_internal(opt, merge_base, side1, side2, result); From patchwork Wed Mar 24 21:32:30 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12162429 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C1DE4C433E3 for ; Wed, 24 Mar 2021 21:33:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9A60261A33 for ; Wed, 24 Mar 2021 21:33:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238501AbhCXVcq (ORCPT ); Wed, 24 Mar 2021 17:32:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57444 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238493AbhCXVck (ORCPT ); Wed, 24 Mar 2021 17:32:40 -0400 Received: from mail-wm1-x334.google.com (mail-wm1-x334.google.com [IPv6:2a00:1450:4864:20::334]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 81D02C06174A for ; Wed, 24 Mar 2021 14:32:39 -0700 (PDT) Received: by mail-wm1-x334.google.com with SMTP id y124-20020a1c32820000b029010c93864955so1989958wmy.5 for ; Wed, 24 Mar 2021 14:32:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=MlU//xxMNdCKgYSkstMNl9UIw6xYo68RnWfaUVnXDfo=; b=F4BpLJOlOjp8E/+26AQCqklZ7zzvnS7nw/vEAR0aZBVfWsjO2AuqNJ+stQ7YTdB90n vYqQLrKgGO2K625JleLjZ2LPCkWpKZJks6kenHU6LUvwF6TyzCLVCJjC9xos2aO7XGPE WoQ30fegyQ+BX3P9aKfjB/MY9NzEVLxn7NYJ4OUUtlKym2Rw/yB4qNfJ0I8wGYJr6o7R g+RAB/2VVXqTGNeIeYEKWLbHWjnC22p1jrvvsd8ojlFoMo4G799hBtf2NG1itpCJUYQ9 cuvjzwAH8EBTGIVQMJ2DPc91Da6fXGjfPnNzwtALuA6mg+VVGA8dGUTh+NL1JtwDAoey yCdQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=MlU//xxMNdCKgYSkstMNl9UIw6xYo68RnWfaUVnXDfo=; b=m9uqUVw69Ehgbft7YmCfxQOlgC+Na1k81tSCL6qKfx5V7IXLjKZVG0yPT+TvgtQzaa RHXezW8QEL4qwW60ZB1I7PRonuiw6DrON4p/94eqGBNpbUr0sMSQpVajgEmWocwzIAQf qFUP+GUBO0sOuOD369d37smQiTuv47LJrOwirLC5Jh4lWGyMK4/ab7BTq+h7riLAR2Kq 6kfEIrLw9jjH90aZZWP2SbStRk8cKCT1HKa1sDpBTB5OifC9LMBPpf/lyJRBmjcuo2my shRiJ8pi1ffrdMZZV5Be8/kUP08G0i1/ElNzR6sUx/dNa+hymFjEfNF4VbmjyDMEV8Xe C4+Q== X-Gm-Message-State: AOAM532Qyi0cVOd21B+hN6SJ5mU6Z5tdBSmbua1EMkdzBHaxhPQsvfac gjW4sL2WlJUUmZ0bZobuF+EjTAZprhI= X-Google-Smtp-Source: ABdhPJykr4T/7u044u9IQObietsIN74I0edjSgeUyKwRiY+cwcvZ39yySLEQ0HIND9X2sxSZCZRNMw== X-Received: by 2002:a1c:21c3:: with SMTP id h186mr4676515wmh.32.1616621558233; Wed, 24 Mar 2021 14:32:38 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id l1sm3190709wrv.87.2021.03.24.14.32.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 24 Mar 2021 14:32:37 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Wed, 24 Mar 2021 21:32:30 +0000 Subject: [PATCH 4/7] merge-ort: avoid accidental API mis-use Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Derrick Stolee , Jonathan Tan , Taylor Blau , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren Previously, callers of the merge-ort API could have passed an uninitialized value for struct merge_result *result. However, we want to check result to see if it has cached renames from a previous merge that we can reuse; such values would be found behind result->priv. However, if result->priv is uninitialized, attempting to access behind it will give a segfault. So, we need result->priv to be NULL (which will be the case if the caller does a memset(&result, 0)), or be written by a previous call to the merge-ort machinery. Documenting this requirement may help, but despite being the person who introduced this requirement, I still missed it once and it did not fail in a very clear way and led to a long debugging session. Add a _properly_initialized field to merge_result; that value will be 0 if the caller zero'ed the merge_result, it will be set to a very specific value by a previous run by the merge-ort machinery, and if it's uninitialized it will most likely either be 0 or some value that does not match the specific one we'd expect allowing us to throw a much more meaningful error. Signed-off-by: Elijah Newren --- merge-ort.c | 7 +++++++ merge-ort.h | 2 ++ 2 files changed, 9 insertions(+) diff --git a/merge-ort.c b/merge-ort.c index bb47fa91a339..5d56b0f90128 100644 --- a/merge-ort.c +++ b/merge-ort.c @@ -52,6 +52,8 @@ enum merge_side { MERGE_SIDE2 = 2 }; +static unsigned RESULT_INITIALIZED = 0x1abe11ed; /* unlikely accidental value */ + struct traversal_callback_data { unsigned long mask; unsigned long dirmask; @@ -3736,6 +3738,10 @@ static void merge_start(struct merge_options *opt, struct merge_result *result) assert(opt->obuf.len == 0); assert(opt->priv == NULL); + if (result->_properly_initialized != 0 && + result->_properly_initialized != RESULT_INITIALIZED) + BUG("struct merge_result passed to merge_incore_*recursive() must be zeroed or filled with values from a previous run"); + assert(!!result->priv == !!result->_properly_initialized); if (result->priv) { opt->priv = result->priv; result->priv = NULL; @@ -3895,6 +3901,7 @@ static void merge_ort_nonrecursive_internal(struct merge_options *opt, result->clean &= strmap_empty(&opt->priv->conflicted); if (!opt->priv->call_depth) { result->priv = opt->priv; + result->_properly_initialized = RESULT_INITIALIZED; opt->priv = NULL; } } diff --git a/merge-ort.h b/merge-ort.h index d53a0a339f33..c011864ffeb1 100644 --- a/merge-ort.h +++ b/merge-ort.h @@ -29,6 +29,8 @@ struct merge_result { * !clean) and to print "CONFLICT" messages. Not for external use. */ void *priv; + /* Also private */ + unsigned _properly_initialized; }; /* From patchwork Wed Mar 24 21:32:31 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12162435 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EE808C433E5 for ; Wed, 24 Mar 2021 21:33:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id BB90061A24 for ; Wed, 24 Mar 2021 21:33:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238492AbhCXVcp (ORCPT ); Wed, 24 Mar 2021 17:32:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57446 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238495AbhCXVck (ORCPT ); Wed, 24 Mar 2021 17:32:40 -0400 Received: from mail-wr1-x435.google.com (mail-wr1-x435.google.com [IPv6:2a00:1450:4864:20::435]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 723D5C06174A for ; Wed, 24 Mar 2021 14:32:40 -0700 (PDT) Received: by mail-wr1-x435.google.com with SMTP id j7so267489wrd.1 for ; Wed, 24 Mar 2021 14:32:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=ErFfbaVocqHONkBYYrYroPIP2f0Kvb63Ez6Rku7Eens=; b=tj/HbXMnHSE00ZsTw7tmjQTobaFzumcwki0plAPNjLBYe0oRU0Zp+yf12MjZriGNbQ ZbY8KipV+Lmsxs+WiMYz5NO97pCAO3HHWA/oHi2NUBIncHlz7kznohGLEt5gLOeGz+N1 WXDYChJDFHzyRDMT5RVLATyPojPXVqShBQFGDR5aASeXr7x7RjqLPfn1ycST3Ygfm7jN 05KuzujkamBaecfb18vJQg+vfO0izWt5YoFJze1FoZVohQFEVXdvVEV2/XeHLcbLT8H+ JdPBXf3fpVkToH6qkO+dgQbiTWMh2bmEbN2X/uVDgJm3U5RWXXGMGDK73ybn5sM2xh8o FtsA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=ErFfbaVocqHONkBYYrYroPIP2f0Kvb63Ez6Rku7Eens=; b=oqOSWb4YjfIe8xvYfQDRyZX+cB4CNOLt+4C+8oko75fPirr/8Q7UYLGDKLohE5nRfS 1Y5xC7BdUETlJqDvti0Y5XzNnaPwu8PBZhYiNz9LmaaqeJui1Co0v37I/kgSlaHEJM+9 CIEDRYmOygQEdB2ucnwkcNZjqeQyhAUEIOwcUF0H72MugeDzWtcQ503xXYMMojEwD7F2 M+2L05Y3FM9n/0lasxVlvmnMdfpPTNxl9rQdsy5xqBK4KLdO6G0Rbn8ryb29Vw1G/Stm /OO3fbMB68XGyfRcyvXrmQ+RewP3zXJIS07sBD0fvI7IRc7JPUFPjxNWvu44EObdwnUH 6EUQ== X-Gm-Message-State: AOAM533nV2UAK9VSr6lrnT3Kt6ZS8o1BgCBEecoW4WPppS2ql5mUJIJX JxrO7jADZ+eyrsbVNnLDd/I9fzGqQos= X-Google-Smtp-Source: ABdhPJwoRaFBF8yiRROjAC81LQf5T3oCCxyUdtLWuugL67qc+CzfFCpHWdN1okxGvJ3cw4z3GQn3Vg== X-Received: by 2002:adf:9bce:: with SMTP id e14mr5792233wrc.29.1616621559235; Wed, 24 Mar 2021 14:32:39 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id m11sm4866036wri.44.2021.03.24.14.32.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 24 Mar 2021 14:32:38 -0700 (PDT) Message-Id: <39f7e384611dd335e6d83ed4ca764a1ee022576f.1616621553.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Wed, 24 Mar 2021 21:32:31 +0000 Subject: [PATCH 5/7] merge-ort: preserve cached renames for the appropriate side Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Derrick Stolee , Jonathan Tan , Taylor Blau , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren Previous commits created an in-memory cache of the results of rename detection, and added logic to detect when that cache could appropriately be used in a subsequent merge operation -- but we were still unconditionally clearing the cache with each new merge operation anyway. If it is valid to reuse the cache from one of the two sides of history, preserve that side. Signed-off-by: Elijah Newren --- merge-ort.c | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/merge-ort.c b/merge-ort.c index 5d56b0f90128..0dffd65ee1a3 100644 --- a/merge-ort.c +++ b/merge-ort.c @@ -475,17 +475,18 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti, /* Free memory used by various renames maps */ for (i = MERGE_SIDE1; i <= MERGE_SIDE2; ++i) { strintmap_func(&renames->dirs_removed[i]); - - partial_clear_dir_rename_count(&renames->dir_rename_count[i]); - if (!reinitialize) - strmap_clear(&renames->dir_rename_count[i], 1); - strmap_func(&renames->dir_renames[i], 0); - strintmap_func(&renames->relevant_sources[i]); - strset_func(&renames->cached_target_names[i]); - strmap_func(&renames->cached_pairs[i], 1); - strset_func(&renames->cached_irrelevant[i]); + if (!reinitialize) + assert(renames->cached_pairs_valid_side == 0); + if (i != renames->cached_pairs_valid_side) { + strset_func(&renames->cached_target_names[i]); + strmap_func(&renames->cached_pairs[i], 1); + strset_func(&renames->cached_irrelevant[i]); + partial_clear_dir_rename_count(&renames->dir_rename_count[i]); + if (!reinitialize) + strmap_clear(&renames->dir_rename_count[i], 1); + } } renames->cached_pairs_valid_side = 0; renames->dir_rename_mask = 0; From patchwork Wed Mar 24 21:32:32 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12162437 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EB05DC433E0 for ; Wed, 24 Mar 2021 21:33:47 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B46FD61A1E for ; Wed, 24 Mar 2021 21:33:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234294AbhCXVdR (ORCPT ); Wed, 24 Mar 2021 17:33:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57456 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234100AbhCXVcm (ORCPT ); Wed, 24 Mar 2021 17:32:42 -0400 Received: from mail-wm1-x335.google.com (mail-wm1-x335.google.com [IPv6:2a00:1450:4864:20::335]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0F429C061763 for ; Wed, 24 Mar 2021 14:32:42 -0700 (PDT) Received: by mail-wm1-x335.google.com with SMTP id 12so54053wmf.5 for ; Wed, 24 Mar 2021 14:32:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=sIoLEFCpV3kQMDL0jJKIAKrKsyCh3Picyuf14rjmjxQ=; b=RQEoPzCOHe/HIWvU3QrBapijgldeuAPQomFAtYO8U6FOYa8IJ+iucj9Uv5YGE3CnRM Dm7zHpn6ZsyVjy0zUZUKlv5vlh3jbIzDsbui/UKn1Iz12FUHigqqV67Lo/PfnT3CV63Z TQsT3byZOI07tbJp6eY80o3gXwg7Pg5MqMGl+lUBusqrNLPScJons4CeSP2zxQFz6864 bev0wHHZsVPSqX8idL8bNcNgSInZIofOTh5mJRsqk4zmoUvEHSU1K68XZMv/1mi9x9Go 3SOU6Syf2+rH0a3iJr3Z5w1wRxz1ufJ/RUjnmsHlRg8jfmrjVtESbr3fCdkbhcqf/ABL rz4w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=sIoLEFCpV3kQMDL0jJKIAKrKsyCh3Picyuf14rjmjxQ=; b=LHTOqc2rqqxEVmTR0Np3g1p8hTuUPNyhU2VIxhLSlIOcqHZqSjxKXfhjO+QTn78wDf efXbygzRumXgEB17pyh9pvhMAV1Hofm8xlidPqc3wUas2c9DIDOW7ED+pHhFBRbh5F+k U3EAfiVBWarYSvL+o+ULwHucIMpwTv06yzLwPxuBIka3ld7TtQTfOyUcRgton67b1Ufo FFEmq5fLTJuba4q43aNTtFnIxWN5f5SXX//mTEjrSUlcsBySv+QJG/nZx+dI5vn8z5tR hEVPxIgDFLU0Voc+dUnIH+s+9T9I8NaKfPnv6Ak4va0WPj8u1r3uIAMOrbKAvW17tTQf FM4A== X-Gm-Message-State: AOAM531DRlAPbQQ/J6jrdZqpya3PkPtQG1qUO0BlUTgrsdE78XkUwasy 9+uIybB1wPwp53SKTrF415ZZw9KVxD0= X-Google-Smtp-Source: ABdhPJx5draj712IleMBvfzZjXlNty4peTle6I7lgmYQ01SlhEScx/DCM2y4S478qOESR7AWc2QpPw== X-Received: by 2002:a1c:ac02:: with SMTP id v2mr4786404wme.111.1616621559809; Wed, 24 Mar 2021 14:32:39 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id j30sm5212714wrj.62.2021.03.24.14.32.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 24 Mar 2021 14:32:39 -0700 (PDT) Message-Id: <7d5d94b34ba5afdb406ee806434d66b7ddf85dcf.1616621554.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Wed, 24 Mar 2021 21:32:32 +0000 Subject: [PATCH 6/7] merge-ort: add helper functions for using cached renames Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Derrick Stolee , Jonathan Tan , Taylor Blau , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren If we have a usable rename cache, then we can remove from relevant_sources all the paths that were cached; diffcore_rename_extended() can then consider an even smaller set of relevant_sources in its rename detection. However, when diffcore_rename_extended() is done, we will need to take the renames it detected and then add back in all the ones we had cached from before. Add helper functions for doing these two operations; the next commit will make use of them. Signed-off-by: Elijah Newren --- merge-ort.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 47 insertions(+) diff --git a/merge-ort.c b/merge-ort.c index 0dffd65ee1a3..8d0353ffbca2 100644 --- a/merge-ort.c +++ b/merge-ort.c @@ -2360,6 +2360,53 @@ static void resolve_diffpair_statuses(struct diff_queue_struct *q) } } +MAYBE_UNUSED +static void prune_cached_from_relevant(struct rename_info *renames, + unsigned side) +{ + /* Reason for this function described in add_pair() */ + struct hashmap_iter iter; + struct strmap_entry *entry; + + /* Remove from relevant_sources all entries in cached_pairs[side] */ + strmap_for_each_entry(&renames->cached_pairs[side], &iter, entry) { + strintmap_remove(&renames->relevant_sources[side], + entry->key); + } + /* Remove from relevant_sources all entries in cached_irrelevant[side] */ + strset_for_each_entry(&renames->cached_irrelevant[side], &iter, entry) { + strintmap_remove(&renames->relevant_sources[side], + entry->key); + } +} + +MAYBE_UNUSED +static void use_cached_pairs(struct merge_options *opt, + struct strmap *cached_pairs, + struct diff_queue_struct *pairs) +{ + struct hashmap_iter iter; + struct strmap_entry *entry; + + /* + * Add to side_pairs all entries from renames->cached_pairs[side_index]. + * (Info in cached_irrelevant[side_index] is not relevant here.) + */ + strmap_for_each_entry(cached_pairs, &iter, entry) { + struct diff_filespec *one, *two; + const char *old_name = entry->key; + const char *new_name = entry->value; + if (!new_name) + new_name = old_name; + + /* We don't care about oid/mode, only filenames and status */ + one = alloc_filespec(old_name); + two = alloc_filespec(new_name); + diff_queue(pairs, one, two); + pairs->queue[pairs->nr-1]->status = entry->value ? 'R' : 'D'; + } +} + static void possibly_cache_new_pair(struct rename_info *renames, struct diff_filepair *p, unsigned side, From patchwork Wed Mar 24 21:32:33 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12162439 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F2E09C433C1 for ; Wed, 24 Mar 2021 21:33:47 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C674861581 for ; Wed, 24 Mar 2021 21:33:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238491AbhCXVdS (ORCPT ); Wed, 24 Mar 2021 17:33:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57454 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238496AbhCXVcm (ORCPT ); Wed, 24 Mar 2021 17:32:42 -0400 Received: from mail-wr1-x42f.google.com (mail-wr1-x42f.google.com [IPv6:2a00:1450:4864:20::42f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C3326C06174A for ; Wed, 24 Mar 2021 14:32:41 -0700 (PDT) Received: by mail-wr1-x42f.google.com with SMTP id x13so226694wrs.9 for ; Wed, 24 Mar 2021 14:32:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:mime-version :content-transfer-encoding:fcc:to:cc; bh=S2bMsGGSF7D1TN6tGpEGniPpAauwDkTbmqQnwnF6mW8=; b=Nn0275kXVdprVf3cu6jqCTq0XOWRi/P7Mq8cAk8MXuEEFRRbnAYK6dqsVe4T98HR2Y fZJs7pXMTUmii2RKVMZOptduG6FlFJkgmdcrxoKNo8vsT2TylWlIgswB2oyLx1xuQ3zZ /Cuzlk7OXUGhOv0LrdsIFREdmZmBXaNrz2UrY2O2s/nPsPZGb42Z01O7WXunxRTwy0h9 hxMToonKXShTPsgFjoZ93/ZtPQvx4viKD2DFTp+ymyuNDPJgdWK3SyxwVOTyyck8wUjK rqvYgZ0uIKeCfWgW0E3mR00HqTps8V9zFcEbFygJmW8NKQP3woDwYak/AsJ9E++fkpo9 fPAg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:mime-version:content-transfer-encoding:fcc:to:cc; bh=S2bMsGGSF7D1TN6tGpEGniPpAauwDkTbmqQnwnF6mW8=; b=aq7jc7kkuGtR3zw3Io+/6uNrIAkyl9xaHHKMvUnDY15ybk+nvxejfi9iCRBBF4qMg4 P9UVx4wczgcTUeZWrIUXiXKWpiG9W5dc0GBvS882Gr5D9pHAf7pxC+mfRd7/ufnuqCg6 qRHJvfdlDDcvlArHjhxhRB8NCuvRbuW7bZON5Qfhb+PtTRYDGtyDtpBCXquFLejDgVQP T4kddOdZkDJo84Uw6kI+so9vSAj8c4Pxzbljr/qU31e0XR0pUVVHDkx/SPOBHdGDj/Fs XN+GejbAz6q6tnolUY44+iCe2n2A/Xek5fd3uHb+dQNem2FuuxLyi7NJebRTFDJl/i/W oBwQ== X-Gm-Message-State: AOAM533WU4VckY85VzPUtwbBX3bCPkYeaawlFQpKON9Va8wTaFp+eG3c wLjzl7q7+EWXU2GVkzCgoVMT+Y40eZU= X-Google-Smtp-Source: ABdhPJwPTgOe2pcqSqOIng2Y/MoWXivEfYQfiH+3g9fEmeTmolfg8UkkLgrN5UoV35TWpGXhu8pflg== X-Received: by 2002:a5d:5047:: with SMTP id h7mr5692228wrt.111.1616621560448; Wed, 24 Mar 2021 14:32:40 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id d204sm4072068wmc.17.2021.03.24.14.32.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 24 Mar 2021 14:32:40 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Wed, 24 Mar 2021 21:32:33 +0000 Subject: [PATCH 7/7] merge-ort, diffcore-rename: employ cached renames when possible MIME-Version: 1.0 Fcc: Sent To: git@vger.kernel.org Cc: Derrick Stolee , Jonathan Tan , Taylor Blau , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren When there are many renames between the old base of a series of commits and the new base, the way sequencer.c, merge-recursive.c, and diffcore-rename.c have traditionally split the work resulted in redetecting the same renames with each and every commit being transplanted. To address this, the last several commits have been creating a cache of rename detection results, determining when it was safe to use such a cache in subsequent merge operations, adding helper functions, and so on. See the previous half dozen commit messages for additional discussion of this optimization, particularly the message a few commits ago entitled "add code to check for whether cached renames can be reused". This commit finally ties all of that work together, modifying the merge algorithm to make use of these cached renames. For the testcases mentioned in commit 557ac0350d ("merge-ort: begin performance work; instrument with trace2_region_* calls", 2020-10-28), this change improves the performance as follows: Before After no-renames: 5.665 s ± 0.129 s 5.624 s ± 0.077 s mega-renames: 11.435 s ± 0.158 s 10.213 s ± 0.032 s just-one-mega: 494.2 ms ± 6.1 ms 497.6 ms ± 5.3 ms That's a fairly small improvement, but mostly because the previous optimizations were so effective for these particular testcases; this optimization only kicks in when the others don't. If we undid the basename-guided rename detection and skip-irrelevant-renames optimizations, then we'd see that this series by itself improved performance as follows: Before Basename Series After Just This Series no-renames: 13.815 s ± 0.062 s 5.814 s ± 0.094 s mega-renames: 1799.937 s ± 0.493 s 303.225 s ± 1.330 s Since this optimization kicks in to help accelerate cases where the previous optimizations do not apply, this last comparison shows that this cached-renames optimization has the potential to help signficantly in cases that don't meet the requirements for the other optimizations to be effective. The changes made in this optimization also lay some important groundwork for a future optimization around having collect_merge_info() avoid recursing into subtrees in more cases. Signed-off-by: Elijah Newren --- diffcore-rename.c | 22 ++++++++++++++++++---- diffcore.h | 3 ++- merge-ort.c | 48 ++++++++++++++++++++++++++++++++++++++++++----- 3 files changed, 63 insertions(+), 10 deletions(-) diff --git a/diffcore-rename.c b/diffcore-rename.c index 7cc24592617e..dfbe65c917e9 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -568,7 +568,8 @@ static void update_dir_rename_counts(struct dir_rename_info *info, static void initialize_dir_rename_info(struct dir_rename_info *info, struct strintmap *relevant_sources, struct strintmap *dirs_removed, - struct strmap *dir_rename_count) + struct strmap *dir_rename_count, + struct strmap *cached_pairs) { struct hashmap_iter iter; struct strmap_entry *entry; @@ -633,6 +634,17 @@ static void initialize_dir_rename_info(struct dir_rename_info *info, rename_dst[i].p->two->path); } + /* Add cached_pairs to counts */ + strmap_for_each_entry(cached_pairs, &iter, entry) { + const char *old_name = entry->key; + const char *new_name = entry->value; + if (!new_name) + /* known delete; ignore it */ + continue; + + update_dir_rename_counts(info, dirs_removed, old_name, new_name); + } + /* * Now we collapse * dir_rename_count: old_directory -> {new_directory -> count} @@ -1247,7 +1259,8 @@ static void handle_early_known_dir_renames(struct dir_rename_info *info, void diffcore_rename_extended(struct diff_options *options, struct strintmap *relevant_sources, struct strintmap *dirs_removed, - struct strmap *dir_rename_count) + struct strmap *dir_rename_count, + struct strmap *cached_pairs) { int detect_rename = options->detect_rename; int minimum_score = options->rename_score; @@ -1363,7 +1376,8 @@ void diffcore_rename_extended(struct diff_options *options, /* Preparation for basename-driven matching. */ trace2_region_enter("diff", "dir rename setup", options->repo); initialize_dir_rename_info(&info, relevant_sources, - dirs_removed, dir_rename_count); + dirs_removed, dir_rename_count, + cached_pairs); trace2_region_leave("diff", "dir rename setup", options->repo); /* Utilize file basenames to quickly find renames. */ @@ -1561,5 +1575,5 @@ void diffcore_rename_extended(struct diff_options *options, void diffcore_rename(struct diff_options *options) { - diffcore_rename_extended(options, NULL, NULL, NULL); + diffcore_rename_extended(options, NULL, NULL, NULL, NULL); } diff --git a/diffcore.h b/diffcore.h index cf8d4cb8617d..de01e64becaf 100644 --- a/diffcore.h +++ b/diffcore.h @@ -181,7 +181,8 @@ void diffcore_rename(struct diff_options *); void diffcore_rename_extended(struct diff_options *options, struct strintmap *relevant_sources, struct strintmap *dirs_removed, - struct strmap *dir_rename_count); + struct strmap *dir_rename_count, + struct strmap *cached_pairs); void diffcore_merge_broken(void); void diffcore_pickaxe(struct diff_options *); void diffcore_order(const char *orderfile); diff --git a/merge-ort.c b/merge-ort.c index 8d0353ffbca2..719222aa4364 100644 --- a/merge-ort.c +++ b/merge-ort.c @@ -752,15 +752,48 @@ static void add_pair(struct merge_options *opt, struct rename_info *renames = &opt->priv->renames; int names_idx = is_add ? side : 0; - if (!is_add) { + if (is_add) { + if (strset_contains(&renames->cached_target_names[side], + pathname)) + return; + } else { unsigned content_relevant = (match_mask == 0); unsigned location_relevant = (dir_rename_mask == 0x07); + /* + * If pathname is found in cached_irrelevant[side] due to + * previous pick but for this commit content is relevant, + * then we need to remove it from cached_irrelevant. + */ + if (content_relevant) + /* strset_remove is no-op if strset doesn't have key */ + strset_remove(&renames->cached_irrelevant[side], + pathname); + + /* + * We do not need to re-detect renames for paths that we already + * know the pairing, i.e. for cached_pairs (or + * cached_irrelevant). However, handle_deferred_entries() needs + * to loop over the union of keys from relevant_sources[side] and + * cached_pairs[side], so for simplicity we set relevant_sources + * for all the cached_pairs too and then strip them back out in + * prune_cached_from_relevant() at the beginning of + * detect_regular_renames(). + */ if (content_relevant || location_relevant) { /* content_relevant trumps location_relevant */ strintmap_set(&renames->relevant_sources[side], pathname, content_relevant ? RELEVANT_CONTENT : RELEVANT_LOCATION); } + + /* + * Avoid creating pair if we've already cached rename results. + * Note that we do this after setting relevant_sources[side] + * as noted in the comment above. + */ + if (strmap_contains(&renames->cached_pairs[side], pathname) || + strset_contains(&renames->cached_irrelevant[side], pathname)) + return; } one = alloc_filespec(pathname); @@ -2336,7 +2369,9 @@ static inline int possible_side_renames(struct rename_info *renames, static inline int possible_renames(struct rename_info *renames) { return possible_side_renames(renames, 1) || - possible_side_renames(renames, 2); + possible_side_renames(renames, 2) || + !strmap_empty(&renames->cached_pairs[1]) || + !strmap_empty(&renames->cached_pairs[2]); } static void resolve_diffpair_statuses(struct diff_queue_struct *q) @@ -2360,7 +2395,6 @@ static void resolve_diffpair_statuses(struct diff_queue_struct *q) } } -MAYBE_UNUSED static void prune_cached_from_relevant(struct rename_info *renames, unsigned side) { @@ -2380,7 +2414,6 @@ static void prune_cached_from_relevant(struct rename_info *renames, } } -MAYBE_UNUSED static void use_cached_pairs(struct merge_options *opt, struct strmap *cached_pairs, struct diff_queue_struct *pairs) @@ -2463,6 +2496,7 @@ static void detect_regular_renames(struct merge_options *opt, struct diff_options diff_opts; struct rename_info *renames = &opt->priv->renames; + prune_cached_from_relevant(renames, side_index); if (!possible_side_renames(renames, side_index)) { /* * No rename detection needed for this side, but we still need @@ -2473,6 +2507,7 @@ static void detect_regular_renames(struct merge_options *opt, return; } + partial_clear_dir_rename_count(&renames->dir_rename_count[side_index]); repo_diff_setup(opt->repo, &diff_opts); diff_opts.flags.recursive = 1; diff_opts.flags.rename_empty = 0; @@ -2490,7 +2525,8 @@ static void detect_regular_renames(struct merge_options *opt, diffcore_rename_extended(&diff_opts, &renames->relevant_sources[side_index], &renames->dirs_removed[side_index], - &renames->dir_rename_count[side_index]); + &renames->dir_rename_count[side_index], + &renames->cached_pairs[side_index]); trace2_region_leave("diff", "diffcore_rename", opt->repo); resolve_diffpair_statuses(&diff_queued_diff); @@ -2597,6 +2633,8 @@ static int detect_and_process_renames(struct merge_options *opt, trace2_region_enter("merge", "regular renames", opt->repo); detect_regular_renames(opt, MERGE_SIDE1); detect_regular_renames(opt, MERGE_SIDE2); + use_cached_pairs(opt, &renames->cached_pairs[1], &renames->pairs[1]); + use_cached_pairs(opt, &renames->cached_pairs[2], &renames->pairs[2]); trace2_region_leave("merge", "regular renames", opt->repo); trace2_region_enter("merge", "directory renames", opt->repo);