diff mbox series

[4/5] patch-id: rewrite code that detects the beginning of a patch

Message ID 20240621231826.3280338-5-gitster@pobox.com (mailing list archive)
State Accepted
Commit 3f288b6fafbb8c76edf53fba7cea90d5a20a9c56
Headers show
Series Tighten patch header parsing in patch-id | expand

Commit Message

Junio C Hamano June 21, 2024, 11:18 p.m. UTC
The get_one_patchid() function reads input lines until it finds a
patch header (the line that begins a patch), whose beginning is one
of:

 (1) an "<object name>", which is "git diff-tree --stdin" shows;
 (2) "commit <object name>", which is "git log" shows; or
 (3) "From <object name>",  which is "git log --format=email" gives.

When it finds such a line, it returns to the caller, reporting the
<object name> it found, and the size of the "patch" it processed.

The caller then calls the function again, which then ignores the
commit log message, and then processes the lines in the patch part
until it hits another "beginning of a patch".

The above logic was fairly easy to see until 2bb73ae8 (patch-id: use
starts_with() and skip_prefix(), 2016-05-28) reorganized the code,
which made another logic that has nothing to do with the "where does
the next patch begin?" logic, which came from 2485eab5
(git-patch-id: do not trip over "no newline" markers, 2011-02-17)
that ignores the "\ No newline at the end", rolled into the same
single if() statement.

Let's split it out.  The "\ No newline at the end" marker is part of
the patch, should not appear before we start reading the patch part,
and does not belong to the detection of patch header.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/patch-id.c | 29 +++++++++++++++++++++--------
 1 file changed, 21 insertions(+), 8 deletions(-)

Comments

Patrick Steinhardt July 29, 2024, 12:03 p.m. UTC | #1
On Fri, Jun 21, 2024 at 04:18:25PM -0700, Junio C Hamano wrote:
> The get_one_patchid() function reads input lines until it finds a
> patch header (the line that begins a patch), whose beginning is one
> of:
> 
>  (1) an "<object name>", which is "git diff-tree --stdin" shows;
>  (2) "commit <object name>", which is "git log" shows; or
>  (3) "From <object name>",  which is "git log --format=email" gives.

All of these items should probably say "which is what ...", where "what"
is what is missing.

Patrick
diff mbox series

Patch

diff --git a/builtin/patch-id.c b/builtin/patch-id.c
index 128e0997d8..a649966f31 100644
--- a/builtin/patch-id.c
+++ b/builtin/patch-id.c
@@ -80,16 +80,19 @@  static int get_one_patchid(struct object_id *next_oid, struct object_id *result,
 		const char *p = line;
 		int len;
 
-		/* Possibly skip over the prefix added by "log" or "format-patch" */
-		if (!skip_prefix(line, "commit ", &p) &&
-		    !skip_prefix(line, "From ", &p) &&
-		    starts_with(line, "\\ ") && 12 < strlen(line)) {
+		/*
+		 * If we see a line that begins with "<object name>",
+		 * "commit <object name>" or "From <object name>", it is
+		 * the beginning of a patch.  Return to the caller, as
+		 * we are done with the one we have been processing.
+		 */
+		if (skip_prefix(line, "commit ", &p))
+			;
+		else if (skip_prefix(line, "From ", &p))
+			;
+		if (!get_oid_hex(p, next_oid)) {
 			if (verbatim)
 				the_hash_algo->update_fn(&ctx, line, strlen(line));
-			continue;
-		}
-
-		if (!get_oid_hex(p, next_oid)) {
 			found_next = 1;
 			break;
 		}
@@ -130,6 +133,16 @@  static int get_one_patchid(struct object_id *next_oid, struct object_id *result,
 				break;
 		}
 
+		/*
+		 * A hunk about an incomplete line may have this
+		 * marker at the end, which should just be ignored.
+		 */
+		if (starts_with(line, "\\ ") && 12 < strlen(line)) {
+			if (verbatim)
+				the_hash_algo->update_fn(&ctx, line, strlen(line));
+			continue;
+		}
+
 		if (diff_is_binary) {
 			if (starts_with(line, "diff ")) {
 				diff_is_binary = 0;