From patchwork Thu Jan 9 08:28:18 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 13932167 Received: from cloud.peff.net (cloud.peff.net [104.130.231.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 95B621474A9 for ; Thu, 9 Jan 2025 08:28:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=104.130.231.41 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736411302; cv=none; b=Jx4yobPVnDoN/9Ve78NbWWiMj9tasOwW7XZJCEwkL0icyRJdfQnwr+KkTrN9KrPovJ2EsdyhPVi682FAu8K84YpLl93fpKMeZrDKVxyZ1zsBmgnwYLNnwhUFrhorrNo2oS+5Navg2XP10m5yylyLBSnwdzvvc7tRva5dAdLkicU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736411302; c=relaxed/simple; bh=cArmnpCmd6pAC8db4ltR9mcqlveZ7lJ53Ch2qLUrZ4s=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=Gf/IlO6sneotueYpZknbuqDt6MX2CQ5fu8a3FL9877cBJRWnc6Y5Y8WCJxNbQ2duTv+IVasxOIGsPTkfWH3uwCaYhHyPnTtm2yBKw2jMCu3xtysBi1Ar90+wm7B4wzP3OoR6QSOXgjyExsuAv3YhMPS47KKYZzhSUowGypJoMlQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=peff.net; spf=pass smtp.mailfrom=peff.net; dkim=pass (2048-bit key) header.d=peff.net header.i=@peff.net header.b=EGryqRnH; arc=none smtp.client-ip=104.130.231.41 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=peff.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=peff.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=peff.net header.i=@peff.net header.b="EGryqRnH" Received: (qmail 25590 invoked by uid 109); 9 Jan 2025 08:28:19 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed; d=peff.net; h=date:from:to:cc:subject:message-id:references:mime-version:content-type:in-reply-to; s=20240930; bh=cArmnpCmd6pAC8db4ltR9mcqlveZ7lJ53Ch2qLUrZ4s=; b=EGryqRnH49rIznR0rHuTHIRdIvMyo5HVv+L7y40yoHF9fuaIE6ahwaZzk0ZEhrNu+qCylZau0VQMLVzHj4UPQFWs0NHTl1iYYGMYfM1XKFJpoHpNF7WqsQ7HDfUV2UP8iG3E7eAs7u9p8YL9XAFnbGxCSyRf21VP10vazIlllp5fKypXltsQCF76xyQV3GrcTsr2+pS+SpcNMZ3QPtMV/7OSh6S1s6BQ/eL7+MPnJujLAIOprRfOpwCQbH0SCm1JKOlD2dQpa3bhCfSk0h9qh3BlavpTdyNBkbCP1mC9C1o8f+eRp15kB1SzCg1pmCqUbzhvn/J3LiCjBN96zB32zA== Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Thu, 09 Jan 2025 08:28:19 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 20063 invoked by uid 111); 9 Jan 2025 08:28:19 -0000 Received: from coredump.intra.peff.net (HELO coredump.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Thu, 09 Jan 2025 03:28:19 -0500 Authentication-Results: peff.net; auth=none Date: Thu, 9 Jan 2025 03:28:18 -0500 From: Jeff King To: Git List Cc: Junio C Hamano , Wink Saville Subject: [PATCH 01/14] run_diff_files(): delay allocation of combine_diff_path Message-ID: <20250109082818.GA2748836@coredump.intra.peff.net> References: <20250109082723.GA2748497@coredump.intra.peff.net> Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20250109082723.GA2748497@coredump.intra.peff.net> While looping over the index entries, when we see a higher level stage the first thing we do is allocate a combine_diff_path struct for it. But this can leak; if check_removed() returns an error, we'll continue to the next iteration of the loop without cleaning up. We can fix this by just delaying the allocation by a few lines. I don't think this leak is triggered in the test suite, but it's pretty easy to see by inspection. My ulterior motive here is that the delayed allocation means we have all of the data needed to initialize "dpath" at the time of malloc, making it easier to factor out a constructor function. Signed-off-by: Jeff King --- diff-lib.c | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/diff-lib.c b/diff-lib.c index c6d3bc4d37..85b8f1fa59 100644 --- a/diff-lib.c +++ b/diff-lib.c @@ -156,18 +156,6 @@ void run_diff_files(struct rev_info *revs, unsigned int option) size_t path_len; struct stat st; - path_len = ce_namelen(ce); - - dpath = xmalloc(combine_diff_path_size(5, path_len)); - dpath->path = (char *) &(dpath->parent[5]); - - dpath->next = NULL; - memcpy(dpath->path, ce->name, path_len); - dpath->path[path_len] = '\0'; - oidclr(&dpath->oid, the_repository->hash_algo); - memset(&(dpath->parent[0]), 0, - sizeof(struct combine_diff_parent)*5); - changed = check_removed(ce, &st); if (!changed) wt_mode = ce_mode_from_stat(ce, st.st_mode); @@ -178,7 +166,19 @@ void run_diff_files(struct rev_info *revs, unsigned int option) } wt_mode = 0; } + + path_len = ce_namelen(ce); + + dpath = xmalloc(combine_diff_path_size(5, path_len)); + dpath->path = (char *) &(dpath->parent[5]); + + dpath->next = NULL; + memcpy(dpath->path, ce->name, path_len); + dpath->path[path_len] = '\0'; + oidclr(&dpath->oid, the_repository->hash_algo); dpath->mode = wt_mode; + memset(&(dpath->parent[0]), 0, + sizeof(struct combine_diff_parent)*5); while (i < entries) { struct cache_entry *nce = istate->cache[i]; From patchwork Thu Jan 9 08:32:36 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 13932174 Received: from cloud.peff.net (cloud.peff.net [104.130.231.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B73AB13C9C4 for ; Thu, 9 Jan 2025 08:32:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=104.130.231.41 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736411560; cv=none; b=ViAJHPlKazO51HrRYtA/ij4MxfUFQWmnePDb28FXrxd/xsEjgq8xpZBgSxluXFQoUhjchcX6cjCXAlOUcRHSvbn54EfSerIq9C65aNnsyntIS3MsT+LLKaRwPo41X4ZG3te6xMEfXLGunH0G6IwRkzCloA49R2XKt64bl2nlI1E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736411560; c=relaxed/simple; bh=f7BGpA9HF0ZnklvmcMQTt7BR89OAEPdFoVv4hyhI54g=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=q1+GP7rz5CJL5dr0AZ9/6qbRNo0svFOt5O6Hfndpp0AYQRaO5qXdW9jX6TmWNAqZgdCWW31hn0xRnEeK/kU8xpzUbImNwwFOihKHhTDyXCtxLC+p2e7mbu6BMFaRps3VEL2ahufLOvSCanSwz55Yg67+SmfTB6hNq7sqTmDQ3DE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=peff.net; spf=pass smtp.mailfrom=peff.net; dkim=pass (2048-bit key) header.d=peff.net header.i=@peff.net header.b=VWre46i0; arc=none smtp.client-ip=104.130.231.41 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=peff.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=peff.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=peff.net header.i=@peff.net header.b="VWre46i0" Received: (qmail 25605 invoked by uid 109); 9 Jan 2025 08:32:37 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed; d=peff.net; h=date:from:to:cc:subject:message-id:references:mime-version:content-type:in-reply-to; s=20240930; bh=f7BGpA9HF0ZnklvmcMQTt7BR89OAEPdFoVv4hyhI54g=; b=VWre46i010dIzERs6k7Meko63eU42utsbwE6UIP+giqLWa0UMwunSQusT6v7RLQo72s2WuVE0v+zbi2y3v2/AZgGusju/9n/euDQaZZg+Q9mt9T5zvYEhNjsFHCzFNmwmEk3IKgQkMzVIvNsQhReR642A3G0us16DGyPHgl4o82PAQPb7Czbg1Me/PiTnTuf7YOaFjmEiHSduLZA6jgwbSag9VqGF8jYnaUSb5t9YjGV/xTV2xzda9I3wjh19f7Lzpv09nemWhi2617nPC6ctVmeZ0dowjizF10hJrIQyyUkfLCn5J1/z27j3BP5ZBHA5Rg/hYnVaQ30vXV5E7cE+w== Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Thu, 09 Jan 2025 08:32:37 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 20113 invoked by uid 111); 9 Jan 2025 08:32:37 -0000 Received: from coredump.intra.peff.net (HELO coredump.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Thu, 09 Jan 2025 03:32:37 -0500 Authentication-Results: peff.net; auth=none Date: Thu, 9 Jan 2025 03:32:36 -0500 From: Jeff King To: Git List Cc: Junio C Hamano , Wink Saville Subject: [PATCH 02/14] combine-diff: add combine_diff_path_new() Message-ID: <20250109083236.GB2748836@coredump.intra.peff.net> References: <20250109082723.GA2748497@coredump.intra.peff.net> Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20250109082723.GA2748497@coredump.intra.peff.net> The combine_diff_path struct has variable size, since it embeds both the memory allocation for the path field as well as a variable-sized parent array. This makes allocating one a bit tricky. We have a helper to compute the required size, but it's up to individual sites to actually initialize all of the fields. Let's provide a constructor function to make that a little nicer. Besides being shorter, it also hides away tricky bits like the computation of the "path" pointer (which is right after the "parent" flex array). As a bonus, using the same constructor everywhere means that we'll consistently initialize all parts of the struct. A few code paths left the parent array unitialized. This didn't cause any bugs, but we'll be able to simplify some code in the next few patches knowing that the parent fields have all been zero'd. This also gets rid of some questionable uses of "int" to store buffer lengths. Though we do use them to allocate, I don't think there are any integer overflow vulnerabilities here (the allocation helper promotes them to size_t and checks arithmetic for overflow, and the actual memcpy of the bytes is done using the possibly-truncated "int" value). Sadly we can't use the FLEX_* macros to simplify the allocation here, because there are two variable-sized parts to the struct (and those macros only handle one). Nor can we get stop publicly declaring combine_diff_path_size(). This patch does not touch the code in path_appendnew() at all, which is not ready to be moved to our new constructor for a few reasons: - path_appendnew() has a memory-reuse optimization where it tries to reuse combine_diff_path structs rather than freeing and reallocating. - path_appendnew() does not create the struct from a single path string, but rather allocates and copies into the buffer from multiple sources. These can be addressed by some refactoring, but let's leave it as-is for now. Signed-off-by: Jeff King --- combine-diff.c | 40 ++++++++++++++++++++++++++-------------- diff-lib.c | 29 ++++++----------------------- diff.h | 5 +++++ 3 files changed, 37 insertions(+), 37 deletions(-) diff --git a/combine-diff.c b/combine-diff.c index 641bc92dbd..45548fd438 100644 --- a/combine-diff.c +++ b/combine-diff.c @@ -47,22 +47,13 @@ static struct combine_diff_path *intersect_paths( if (!n) { for (i = 0; i < q->nr; i++) { - int len; - const char *path; if (diff_unmodified_pair(q->queue[i])) continue; - path = q->queue[i]->two->path; - len = strlen(path); - p = xmalloc(combine_diff_path_size(num_parent, len)); - p->path = (char *) &(p->parent[num_parent]); - memcpy(p->path, path, len); - p->path[len] = 0; - p->next = NULL; - memset(p->parent, 0, - sizeof(p->parent[0]) * num_parent); - - oidcpy(&p->oid, &q->queue[i]->two->oid); - p->mode = q->queue[i]->two->mode; + p = combine_diff_path_new(q->queue[i]->two->path, + strlen(q->queue[i]->two->path), + q->queue[i]->two->mode, + &q->queue[i]->two->oid, + num_parent); oidcpy(&p->parent[n].oid, &q->queue[i]->one->oid); p->parent[n].mode = q->queue[i]->one->mode; p->parent[n].status = q->queue[i]->status; @@ -1667,3 +1658,24 @@ void diff_tree_combined_merge(const struct commit *commit, diff_tree_combined(&commit->object.oid, &parents, rev); oid_array_clear(&parents); } + +struct combine_diff_path *combine_diff_path_new(const char *path, + size_t path_len, + unsigned int mode, + const struct object_id *oid, + size_t num_parents) +{ + struct combine_diff_path *p; + + p = xmalloc(combine_diff_path_size(num_parents, path_len)); + p->path = (char *)&(p->parent[num_parents]); + memcpy(p->path, path, path_len); + p->path[path_len] = 0; + p->next = NULL; + p->mode = mode; + oidcpy(&p->oid, oid); + + memset(p->parent, 0, sizeof(p->parent[0]) * num_parents); + + return p; +} diff --git a/diff-lib.c b/diff-lib.c index 85b8f1fa59..471ef99614 100644 --- a/diff-lib.c +++ b/diff-lib.c @@ -153,7 +153,6 @@ void run_diff_files(struct rev_info *revs, unsigned int option) struct diff_filepair *pair; unsigned int wt_mode = 0; int num_compare_stages = 0; - size_t path_len; struct stat st; changed = check_removed(ce, &st); @@ -167,18 +166,8 @@ void run_diff_files(struct rev_info *revs, unsigned int option) wt_mode = 0; } - path_len = ce_namelen(ce); - - dpath = xmalloc(combine_diff_path_size(5, path_len)); - dpath->path = (char *) &(dpath->parent[5]); - - dpath->next = NULL; - memcpy(dpath->path, ce->name, path_len); - dpath->path[path_len] = '\0'; - oidclr(&dpath->oid, the_repository->hash_algo); - dpath->mode = wt_mode; - memset(&(dpath->parent[0]), 0, - sizeof(struct combine_diff_parent)*5); + dpath = combine_diff_path_new(ce->name, ce_namelen(ce), + wt_mode, null_oid(), 5); while (i < entries) { struct cache_entry *nce = istate->cache[i]; @@ -405,16 +394,10 @@ static int show_modified(struct rev_info *revs, if (revs->combine_merges && !cached && (!oideq(oid, &old_entry->oid) || !oideq(&old_entry->oid, &new_entry->oid))) { struct combine_diff_path *p; - int pathlen = ce_namelen(new_entry); - - p = xmalloc(combine_diff_path_size(2, pathlen)); - p->path = (char *) &p->parent[2]; - p->next = NULL; - memcpy(p->path, new_entry->name, pathlen); - p->path[pathlen] = 0; - p->mode = mode; - oidclr(&p->oid, the_repository->hash_algo); - memset(p->parent, 0, 2 * sizeof(struct combine_diff_parent)); + + p = combine_diff_path_new(new_entry->name, + ce_namelen(new_entry), + mode, null_oid(), 2); p->parent[0].status = DIFF_STATUS_MODIFIED; p->parent[0].mode = new_entry->ce_mode; oidcpy(&p->parent[0].oid, &new_entry->oid); diff --git a/diff.h b/diff.h index 6e6007c17b..5cddd5a870 100644 --- a/diff.h +++ b/diff.h @@ -486,6 +486,11 @@ struct combine_diff_path { #define combine_diff_path_size(n, l) \ st_add4(sizeof(struct combine_diff_path), (l), 1, \ st_mult(sizeof(struct combine_diff_parent), (n))) +struct combine_diff_path *combine_diff_path_new(const char *path, + size_t path_len, + unsigned int mode, + const struct object_id *oid, + size_t num_parents); void show_combined_diff(struct combine_diff_path *elem, int num_parent, struct rev_info *); From patchwork Thu Jan 9 08:33:10 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 13932175 Received: from cloud.peff.net (cloud.peff.net [104.130.231.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 491E71FDA for ; Thu, 9 Jan 2025 08:33:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=104.130.231.41 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736411593; cv=none; b=GxRc4wNknhETEfqRcTsdwQVs72jVGXNSAByCqsg8mF9AHOqf5MRKWcLcrEuzTBYMEzng+eGNWOHMLQdSfI6bgWc9E4aJzXWDSHBMTGptjbhJ1MbziP/Il7fhsEiUQVCJ8DRwXvsm1xNNIjEbNV0eRHT55/qJ52sVawgUPt0P1ug= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736411593; c=relaxed/simple; bh=WEpWXc50XkA14XPM7inT92wM7gnl6gKV8HwSldSzp0g=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=mffnazTJB3q8YhLLXego1xkG85OdZ+xVkZwpILhG2zMHVcsqgJ82Php31JRV+7fq9QBRD1pGUVhtyIJPcnqYhdhsGyWC3I45jIWYcDE57Dh4t+3tcl/BrS3DldpCr/TdQGs/ofRaNf29Wyg5/jrddqEmI8YfiQp5iqRImZFvB8g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=peff.net; spf=pass smtp.mailfrom=peff.net; dkim=pass (2048-bit key) header.d=peff.net header.i=@peff.net header.b=gj3ZmnB6; arc=none smtp.client-ip=104.130.231.41 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=peff.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=peff.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=peff.net header.i=@peff.net header.b="gj3ZmnB6" Received: (qmail 25617 invoked by uid 109); 9 Jan 2025 08:33:11 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed; d=peff.net; h=date:from:to:cc:subject:message-id:references:mime-version:content-type:in-reply-to; s=20240930; bh=WEpWXc50XkA14XPM7inT92wM7gnl6gKV8HwSldSzp0g=; b=gj3ZmnB66fGCIbcdqxS0A49x4QN1k6OIOyA9/WBZMitPeuE1e8fdpLX0k8y30328JZRPmpcN0zB+6RQXgNwDudvzUbHNjClbHnKpjBbi8LD0Ur8wDM2eIWt+hX1sda07abij0kuvwhw60IKBTfLSX6cOh/Ta8OHeMHE+1lSDcIaLRdKlK5VAdc5Q/CV3UG2NiqJpgPdB2+SKr/uV9/PtH7b3dqFbdpcbAXwq8EObRON5mdzh5vGvLvoLzPJ1En8sftrepQz1KiBXuxlhVICMCVooNcEANHGYa7YPXNhSbD0ODDsAg9BPEm/+x7X3CLUIGSFRFEBm54dIWP1mthofDQ== Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Thu, 09 Jan 2025 08:33:11 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 20119 invoked by uid 111); 9 Jan 2025 08:33:10 -0000 Received: from coredump.intra.peff.net (HELO coredump.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Thu, 09 Jan 2025 03:33:10 -0500 Authentication-Results: peff.net; auth=none Date: Thu, 9 Jan 2025 03:33:10 -0500 From: Jeff King To: Git List Cc: Junio C Hamano , Wink Saville Subject: [PATCH 03/14] tree-diff: clear parent array in path_appendnew() Message-ID: <20250109083310.GC2748836@coredump.intra.peff.net> References: <20250109082723.GA2748497@coredump.intra.peff.net> Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20250109082723.GA2748497@coredump.intra.peff.net> All of the other functions which allocate a combine_diff_path struct zero out the parent array, but this code path does not. There's no bug, since our caller will fill in most of the fields. But leaving the unused fields (like combine_diff_parent.path) uninitialized makes working with the struct more error-prone than it needs to be. Let's just zero the parent field to be consistent with the combine_diff_path_new() allocator. Signed-off-by: Jeff King --- tree-diff.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tree-diff.c b/tree-diff.c index d9237ffd9b..24f7b5912c 100644 --- a/tree-diff.c +++ b/tree-diff.c @@ -151,8 +151,6 @@ static int emit_diff_first_parent_only(struct diff_options *opt, struct combine_ * process(p); * p = pprev; * ; don't forget to free tail->next in the end - * - * p->parent[] remains uninitialized. */ static struct combine_diff_path *path_appendnew(struct combine_diff_path *last, int nparent, const struct strbuf *base, const char *path, int pathlen, @@ -187,6 +185,8 @@ static struct combine_diff_path *path_appendnew(struct combine_diff_path *last, p->mode = mode; oidcpy(&p->oid, oid ? oid : null_oid()); + memset(p->parent, 0, sizeof(p->parent[0]) * nparent); + return p; } From patchwork Thu Jan 9 08:42:29 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 13932187 Received: from cloud.peff.net (cloud.peff.net [104.130.231.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BEE9721504D for ; Thu, 9 Jan 2025 08:42:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=104.130.231.41 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736412153; cv=none; b=WBdsW0UennKfBaJTNa64KkRzc/LUEeSuRgy7/NbfTjj5ftKdCYrNWBSHxsF0JRvC63wtpzUavAhIttsmrBPRvIYITKrkdShGQGMRGghpw2Mwgc9srylAfbAt5PncinpcdyJrDpJyVwbq0EUUnnGdkO3+Zkj3W+vscd4nQLD8rrw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736412153; c=relaxed/simple; bh=q+Sfo1geaO7t/PMKaTDhR02Bunb3zzNTSgO0ZYa+QnU=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=vEyX53iP6OGzkmxdRLQO39jiqiM4Ydn2SxCUgxo+t7rWplOR44377ZtFasjeKf9RdApGmDXws6dGh2UcEQmva6BvNYZ6lvQtDrnc9NME5a094OPuMFIAlfVVsXPf5+xYo0YUEi8Lsddl14Yty3VryCxdgZ9Ac0VubNBshMzxh8E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=peff.net; spf=pass smtp.mailfrom=peff.net; dkim=pass (2048-bit key) header.d=peff.net header.i=@peff.net header.b=YisQa8R8; arc=none smtp.client-ip=104.130.231.41 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=peff.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=peff.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=peff.net header.i=@peff.net header.b="YisQa8R8" Received: (qmail 25656 invoked by uid 109); 9 Jan 2025 08:42:30 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed; d=peff.net; h=date:from:to:cc:subject:message-id:references:mime-version:content-type:in-reply-to; s=20240930; bh=q+Sfo1geaO7t/PMKaTDhR02Bunb3zzNTSgO0ZYa+QnU=; b=YisQa8R8YT/TQbcZtkaeZHE20fVrSKUu8lxvpaDc96XhFnz9yERKb/QQtEW+vGMNLuklZZCo9TnnTz3QcxjutbRuNra7yTB9UNjFzV2ifXxERgLaAWIE5KbNB5rq5UqRB0OrhUM45Prv4Ja6ZLcko9uV1ZH9CdBXlEj/kw5wBoSXn44/Yg2SYKD/Z8KuTt1fJ38KV0iA8sW2puuyPu6WvQ2l1JURfgBZ4dUrMmQbHqBlwBCWWkXp9G0plBXUY6rCVclpD61BSyBf8qGxPFkd7Pp6BNC4mZ4z3S1XrV1NU26FCqULVzO/Tjse93k4jst9pK7eY63s+8sa93WEu1uMog== Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Thu, 09 Jan 2025 08:42:30 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 20288 invoked by uid 111); 9 Jan 2025 08:42:30 -0000 Received: from coredump.intra.peff.net (HELO coredump.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Thu, 09 Jan 2025 03:42:30 -0500 Authentication-Results: peff.net; auth=none Date: Thu, 9 Jan 2025 03:42:29 -0500 From: Jeff King To: Git List Cc: Junio C Hamano , Wink Saville Subject: [PATCH 04/14] combine-diff: use pointer for parent paths Message-ID: <20250109084229.GD2748836@coredump.intra.peff.net> References: <20250109082723.GA2748497@coredump.intra.peff.net> Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20250109082723.GA2748497@coredump.intra.peff.net> Commit d76ce4f734 (log,diff-tree: add --combined-all-paths option, 2019-02-07) added a "path" field to each combine_diff_parent struct. It's defined as a strbuf, but this is overkill. We never manipulate the buffer beyond inserting a single string into it. And in fact there's a small bug: we zero the parent structs, including the path strbufs. For the 0th parent, we strbuf_init() the strbuf before adding to it. But for subsequent parents, we never do the init. This is technically violating the strbuf API, though the code there is resilient enough to handle this zero'd state. This patch switches us to just store an allocated string pointer. Zeroing it is enough to properly initialize it there (modulo the usual assumption we make that a NULL pointer is all-zeroes). And as a bonus, we can just check for a non-NULL value to see if it is present, rather than repeating the combined_all_paths logic at each site. Signed-off-by: Jeff King --- combine-diff.c | 30 +++++++++++------------------- diff.h | 2 +- 2 files changed, 12 insertions(+), 20 deletions(-) diff --git a/combine-diff.c b/combine-diff.c index 45548fd438..ae3cbfc699 100644 --- a/combine-diff.c +++ b/combine-diff.c @@ -60,9 +60,7 @@ static struct combine_diff_path *intersect_paths( if (combined_all_paths && filename_changed(p->parent[n].status)) { - strbuf_init(&p->parent[n].path, 0); - strbuf_addstr(&p->parent[n].path, - q->queue[i]->one->path); + p->parent[n].path = xstrdup(q->queue[i]->one->path); } *tail = p; tail = &p->next; @@ -83,9 +81,7 @@ static struct combine_diff_path *intersect_paths( /* p->path not in q->queue[]; drop it */ *tail = p->next; for (j = 0; j < num_parent; j++) - if (combined_all_paths && - filename_changed(p->parent[j].status)) - strbuf_release(&p->parent[j].path); + free(p->parent[j].path); free(p); continue; } @@ -101,8 +97,7 @@ static struct combine_diff_path *intersect_paths( p->parent[n].status = q->queue[i]->status; if (combined_all_paths && filename_changed(p->parent[n].status)) - strbuf_addstr(&p->parent[n].path, - q->queue[i]->one->path); + p->parent[n].path = xstrdup(q->queue[i]->one->path); tail = &p->next; i++; @@ -987,8 +982,9 @@ static void show_combined_header(struct combine_diff_path *elem, if (rev->combined_all_paths) { for (i = 0; i < num_parent; i++) { - char *path = filename_changed(elem->parent[i].status) - ? elem->parent[i].path.buf : elem->path; + const char *path = elem->parent[i].path ? + elem->parent[i].path : + elem->path; if (elem->parent[i].status == DIFF_STATUS_ADDED) dump_quoted_path("--- ", "", "/dev/null", line_prefix, c_meta, c_reset); @@ -1269,12 +1265,10 @@ static void show_raw_diff(struct combine_diff_path *p, int num_parent, struct re for (i = 0; i < num_parent; i++) if (rev->combined_all_paths) { - if (filename_changed(p->parent[i].status)) - write_name_quoted(p->parent[i].path.buf, stdout, - inter_name_termination); - else - write_name_quoted(p->path, stdout, - inter_name_termination); + const char *path = p->parent[i].path ? + p->parent[i].path : + p->path; + write_name_quoted(path, stdout, inter_name_termination); } write_name_quoted(p->path, stdout, line_termination); } @@ -1636,9 +1630,7 @@ void diff_tree_combined(const struct object_id *oid, struct combine_diff_path *tmp = paths; paths = paths->next; for (i = 0; i < num_parent; i++) - if (rev->combined_all_paths && - filename_changed(tmp->parent[i].status)) - strbuf_release(&tmp->parent[i].path); + free(tmp->parent[i].path); free(tmp); } diff --git a/diff.h b/diff.h index 5cddd5a870..f5f6ea00fb 100644 --- a/diff.h +++ b/diff.h @@ -480,7 +480,7 @@ struct combine_diff_path { char status; unsigned int mode; struct object_id oid; - struct strbuf path; + char *path; } parent[FLEX_ARRAY]; }; #define combine_diff_path_size(n, l) \ From patchwork Thu Jan 9 08:42:48 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 13932188 Received: from cloud.peff.net (cloud.peff.net [104.130.231.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8C1AE21504D for ; Thu, 9 Jan 2025 08:42:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=104.130.231.41 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736412172; cv=none; b=b8DRVydoPVRjdtP8pvJCziVekrUZ8w5tnrEO+SzGsy09bKvu6OfBf9KUoHa6CwAV9jRR2HC7gfR34zKeiEn2J3AQLNYA4EyIq3uHDKwWzLSxBDq4uaMU5Nvv1tKMcHOa329fHUhYNbIArqz297FFti0nJhTPOkk5OyVNmaNLmWQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736412172; c=relaxed/simple; bh=gFvL8BljVac3ilqMROs6Y0Fh9+0/DlQHjo3x0wR8BoQ=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=q/JHIfHGh20NY4npEJYe+Ancj8liTcZEVMQyJB2CL9+u2sc4zs2PEuJxZ7EYup7MSGwY+YhIPsuQRyQtvTlEB22UgyRpF5+jLBXVgSESNfFdfk8fjvO7saVW2XjqF9Lk7F+yBkOFGTTyoj6JUqkczNIutDJM9VJOr8s3Ofo7juE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=peff.net; spf=pass smtp.mailfrom=peff.net; dkim=pass (2048-bit key) header.d=peff.net header.i=@peff.net header.b=AdYOmrrI; arc=none smtp.client-ip=104.130.231.41 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=peff.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=peff.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=peff.net header.i=@peff.net header.b="AdYOmrrI" Received: (qmail 25665 invoked by uid 109); 9 Jan 2025 08:42:49 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed; d=peff.net; h=date:from:to:cc:subject:message-id:references:mime-version:content-type:in-reply-to; s=20240930; bh=gFvL8BljVac3ilqMROs6Y0Fh9+0/DlQHjo3x0wR8BoQ=; b=AdYOmrrI+wtoLEep+PmwguKNg50TEbd1bpNWNEB9NgF6GqIL9FUks8fLTo6OwCFphL1G9KIkEdd7yknn9fu95DdpVr7cnG1+tIXzzoGDFFQN/b3D093fp8N+p0QWuaW0QwM7POvM8WFl//TA1Ynt1DTIqEwrr7mEyDiwMFIQn3QdOICQ/uurYN6CqgRG91470NTsWarXsmt1GKETN3qtFzV2mu3vL3++G9v3HGDDF/eYVmunoSei6Erj8DCMhbePFB8JRrpG5pfbDEOg+FLBVCh80oSvMNxvqBPG6pMs23Gu+22GNAYDFK0Gt4fMnxZI6oqbOHFF2hxpCCMCXryBFg== Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Thu, 09 Jan 2025 08:42:49 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 20293 invoked by uid 111); 9 Jan 2025 08:42:49 -0000 Received: from coredump.intra.peff.net (HELO coredump.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Thu, 09 Jan 2025 03:42:49 -0500 Authentication-Results: peff.net; auth=none Date: Thu, 9 Jan 2025 03:42:48 -0500 From: Jeff King To: Git List Cc: Junio C Hamano , Wink Saville Subject: [PATCH 05/14] diff: add a comment about combine_diff_path.parent.path Message-ID: <20250109084248.GE2748836@coredump.intra.peff.net> References: <20250109082723.GA2748497@coredump.intra.peff.net> Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20250109082723.GA2748497@coredump.intra.peff.net> We only fill in the per-parent "path" field when it differs from what's in combine_diff_path.path (and even then only when the option is appropriate). Let's document that. Suggested-by: Wink Saville Signed-off-by: Jeff King --- diff.h | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/diff.h b/diff.h index f5f6ea00fb..60e7db4ad6 100644 --- a/diff.h +++ b/diff.h @@ -480,6 +480,12 @@ struct combine_diff_path { char status; unsigned int mode; struct object_id oid; + /* + * This per-parent path is filled only when doing a combined + * diff with revs.combined_all_paths set, and only if the path + * differs from the post-image (e.g., a rename or copy). + * Otherwise it is left NULL. + */ char *path; } parent[FLEX_ARRAY]; }; From patchwork Thu Jan 9 08:44:21 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 13932189 Received: from cloud.peff.net (cloud.peff.net [104.130.231.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C5570215051 for ; Thu, 9 Jan 2025 08:44:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=104.130.231.41 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736412265; cv=none; b=Sgd0eiyL8RXZADhMVSFjhppgBC+x5XkC12lTcrf6KnzmD3nnKvSGp1WwDpLB25P4F0cis+TkueX9PmEvSk1+OIcOFxX5PtQrw5R0BP5WaYM2TA4BTPiRp5al9woIoWITm2c/UVbmdMgZte9Md83XZyl8fC+QiosjCKaZX86hxCs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736412265; c=relaxed/simple; bh=WkNA5paKm3lbA0PCaaUwsWVG8+W30SxcWMzSUvjl12w=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=J6AIl0dC7/+YhPHEQgXrQ+fBIILVh48zcqOMQB9rzIN08b/ee0sJWIZr19fo3D/kPx6dN7UzRFeJ0yGtut628m2XOmvBfeIOfFI/UgFokbGVrEQOmnxuQLZy6MKpWKViKeAiYGRCQmUmrXge4Qrsq9grwRs7sZzsJtNvfbt4Jn8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=peff.net; spf=pass smtp.mailfrom=peff.net; dkim=pass (2048-bit key) header.d=peff.net header.i=@peff.net header.b=esRtMa3z; arc=none smtp.client-ip=104.130.231.41 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=peff.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=peff.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=peff.net header.i=@peff.net header.b="esRtMa3z" Received: (qmail 25681 invoked by uid 109); 9 Jan 2025 08:44:22 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed; d=peff.net; h=date:from:to:cc:subject:message-id:references:mime-version:content-type:in-reply-to; s=20240930; bh=WkNA5paKm3lbA0PCaaUwsWVG8+W30SxcWMzSUvjl12w=; b=esRtMa3zWSterUBkqNyEIAmVop1Vqc9rdLdFO1F3qpxn5WUt75G6S4/HgmHup8t562g6CDDZl6/YBCZusBJIzMe3plF59MCEfi+xAe7aiX2rfcUFJoExCpM+sn6x+xpn2ztacyoqm5aVMOoNiZ8LV77/6b0QltidbFuM0ukbBMfbxgTD669eMD7x8il20JCsJmJ16ZrnB4QGKcWqyqXLXl7ozBNqRQ70Eh2dU1BL6l0d/LgMH+zwi/Qvm57ZqsfoMwkdy3zrH8upLacX+vDhB56dS0RDyUQr6jopOOCUYQqVb0EjY1ma8YDomXI1+342tzuANRMIPW0g1/iuYxLRjA== Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Thu, 09 Jan 2025 08:44:22 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 20298 invoked by uid 111); 9 Jan 2025 08:44:22 -0000 Received: from coredump.intra.peff.net (HELO coredump.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Thu, 09 Jan 2025 03:44:22 -0500 Authentication-Results: peff.net; auth=none Date: Thu, 9 Jan 2025 03:44:21 -0500 From: Jeff King To: Git List Cc: Junio C Hamano , Wink Saville Subject: [PATCH 06/14] run_diff_files(): de-mystify the size of combine_diff_path struct Message-ID: <20250109084421.GF2748836@coredump.intra.peff.net> References: <20250109082723.GA2748497@coredump.intra.peff.net> Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20250109082723.GA2748497@coredump.intra.peff.net> We allocate a combine_diff_path struct with space for 5 parents. Why 5? The history is not particularly enlightening. The allocation comes from b4b1550315 (Don't instantiate structures with FAMs., 2006-06-18), which just switched to xmalloc from a stack struct with 5 elements. That struct changed to 5 from 4 in 2454c962fb (combine-diff: show mode changes as well., 2006-02-06), when we also moved from storing raw sha1 bytes to the combine_diff_parent struct. But no explanation is given. That 4 comes from the earliest code in ea726d02e9 (diff-files: -c and --cc options., 2006-01-28). One might guess it is for the 4 stages we can store in the index. But this code path only ever diffs the current state against stages 2 and 3. So we only need two slots. And it's easy to see this is still the case. We fill the parent slots by subtracting 2 from the ce_stage() values, ignoring values below 2. And since ce_stage() is only 2 bits, there are 4 values, and thus we need 2 slots. Let's use the correct value (saving a tiny bit of memory) and add a comment explaining what's going on (saving a tiny bit of programmer brain power). Arguably we could use: 1 + (STAGEMASK >> STAGESHIFT) - 2 which lets the compiler enforce that we will not go out-of-bounds if we see an unexpected value from ce_stage(). But that is more confusing to explain, and the constant "2" is baked into other parts of the function. It is a fundamental constant, not something where somebody might bump a macro and forget to update this code. Signed-off-by: Jeff King --- diff-lib.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/diff-lib.c b/diff-lib.c index 471ef99614..353b473ed5 100644 --- a/diff-lib.c +++ b/diff-lib.c @@ -166,8 +166,13 @@ void run_diff_files(struct rev_info *revs, unsigned int option) wt_mode = 0; } + /* + * Allocate space for two parents, which will come from + * index stages #2 and #3, if present. Below we'll fill + * these from (stage - 2). + */ dpath = combine_diff_path_new(ce->name, ce_namelen(ce), - wt_mode, null_oid(), 5); + wt_mode, null_oid(), 2); while (i < entries) { struct cache_entry *nce = istate->cache[i]; From patchwork Thu Jan 9 08:46:49 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 13932190 Received: from cloud.peff.net (cloud.peff.net [104.130.231.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6B7062F2D for ; Thu, 9 Jan 2025 08:46:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=104.130.231.41 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736412413; cv=none; b=Btmerf6y0SfXOcg7Vcksq2UcwmRsyV+HC8D3i9Taio5s/XcdtDQacupry8mcuvt8HIivrhKfbIppl+z2j5R/Yhu/MlYEkUiw6yI/n6IDNJEqMtmw1ptAW534dLXgSme4QscGhMVAYJTMUlU2HVh1tuN75bSkZ1fmv2nh3iPvpNc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736412413; c=relaxed/simple; bh=A0tgNetaVLpeUdEDWB4z3h7ZaFLGeW55new43be59us=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=Hju1sG166xDrFORJElwL4tRsXWij5yEPFx4z1J5YyDJabNRMX7jh94k6rVBdyYcVXT8cXIGa91ZoH0Mo1f3+anNUk+QUkLzrJC//vyhAHpuaGO8klzDS5wzaO4pKcmOC49E9zFCkxnr3QrqScOMssRmHm941dt8Tm1u5MPBie9Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=peff.net; spf=pass smtp.mailfrom=peff.net; dkim=pass (2048-bit key) header.d=peff.net header.i=@peff.net header.b=Nlw+0bwI; arc=none smtp.client-ip=104.130.231.41 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=peff.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=peff.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=peff.net header.i=@peff.net header.b="Nlw+0bwI" Received: (qmail 25695 invoked by uid 109); 9 Jan 2025 08:46:50 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed; d=peff.net; h=date:from:to:cc:subject:message-id:references:mime-version:content-type:content-transfer-encoding:in-reply-to; s=20240930; bh=A0tgNetaVLpeUdEDWB4z3h7ZaFLGeW55new43be59us=; b=Nlw+0bwIXvPxb7uoOuv0Xf2KOqdEUIOR6wMxOA2RMBO9XQxZCuHtgpPtF0h39uyELpe0y2Dz7nOMc7lonbSnTF7MAJXwqhT3OkEmxuN3LUzzeFkEtay6KGm662Vr7POAvXlxqdsNRiShbj90yvVYPcXFksTENmWJqwG4CcSjJ22TKmuajOAubUxxEyP6jwBl2H7Lzg7RyImRg3OyhXRGV00YsjOEFHg2k/7oHOyBxMaVXKceK2K9Kqt/CA6e4MXHO7UKmkcnGiAnFraabo9WMybhqVUTA3CXUpCvvW+qlhLjhG2hrkx24dICqR66FHVVT/APDxKSh698MlqZFuDj/g== Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Thu, 09 Jan 2025 08:46:50 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 20340 invoked by uid 111); 9 Jan 2025 08:46:50 -0000 Received: from coredump.intra.peff.net (HELO coredump.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Thu, 09 Jan 2025 03:46:50 -0500 Authentication-Results: peff.net; auth=none Date: Thu, 9 Jan 2025 03:46:49 -0500 From: Jeff King To: Git List Cc: Junio C Hamano , Wink Saville Subject: [PATCH 07/14] tree-diff: drop path_appendnew() alloc optimization Message-ID: <20250109084649.GG2748836@coredump.intra.peff.net> References: <20250109082723.GA2748497@coredump.intra.peff.net> Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20250109082723.GA2748497@coredump.intra.peff.net> When we're diffing trees, we create a list of combine_diff_path structs that represent changed paths. We allocate each struct and add it to the list with path_appendnew(), which we then feed to opt->pathchange(). That function tells us whether the path is of interest or not; if not, then we can throw away the struct we allocated. So there's an optimization to avoid extra allocations: instead of throwing away the new entry, we try to reuse it. If it was large enough to store the next path we care about, we can do so. And if not, we fall back to freeing and re-allocating a new struct. This comes from 72441af7c4 (tree-diff: rework diff_tree() to generate diffs for multiparent cases as well, 2014-04-07), where the goal was to have even the 2-parent diff code use the combine-diff infrastructure, but without taking a performance hit. The implementation causes some complexities in the interface (as we store the allocation length inside the "next" pointer), and prevents us from using the regular combine_diff_path_new() constructor. The complexity is mostly contained inside two functions, but it's worth re-evaluating how much it's helping. That commit claims it helps ~1% on generating two-parent diffs in linux.git. Here are the timings I get on the same command today ("old" is the current tip of master, and "new" has this patch applied): Benchmark 1: ./git.old log --raw --no-abbrev --no-renames v3.10..v3.11 Time (mean ± σ): 532.9 ms ± 5.8 ms [User: 472.7 ms, System: 59.6 ms] Range (min … max): 525.9 ms … 543.3 ms 10 runs Benchmark 2: ./git.new log --raw --no-abbrev --no-renames v3.10..v3.11 Time (mean ± σ): 538.3 ms ± 5.7 ms [User: 478.0 ms, System: 59.7 ms] Range (min … max): 528.5 ms … 545.3 ms 10 runs Summary ./git.old log --raw --no-abbrev --no-renames v3.10..v3.11 ran 1.01 ± 0.02 times faster than ./git.new log --raw --no-abbrev --no-renames v3.10..v3.11 So we do end up on average 1% faster, but with 2% of noise. I tried to focus more on diff performance by running the commit traversal separately, like: git rev-list v3.10..v3.11 >in and then timing just the diffs: Benchmark 1: ./git.old diff-tree --stdin -r