mbox series

[v2,0/5] Tighten patch header parsing in patch-id

Message ID 20240730011738.4032377-1-gitster@pobox.com (mailing list archive)
Headers show
Series Tighten patch header parsing in patch-id | expand

Message

Junio C Hamano July 30, 2024, 1:17 a.m. UTC
Updates from v1

     - changed flag bits used internally from CPP macro to enum and
       added a bit of comments.

The patch-id command loops over a series of patches, picking up the
origin commit object name (which is on the "patch header" line) and
then computing the patch identifier out of the "patch" (series of
"diff") that follows the "patch header".

The parser is structured in a bit "strange" way.  It repeatedly
calls a single helper function get_one_patchid() that returns when a
patch header is recognised, or skips until the "patch" part begins
and then computes the "patch id" over the "patch" part, until it
sees a patch header.  The caller knows that it gets just the "patch
header" for the first patch with its first call, and the second call
is about computing the patch id for the first patch, whose
originating commit was obtained from the first call, etc.

During the second and subsequent call (i.e. after finding a patch
header which caused the get_one_patchid() to return, calling the
helper again, expecting it to skip the commit log and find the patch
for which we are asked to compute the patch id), we shouldn't look
for the patch header at all.  Otherwise, a line that looks like a
patch header in the log message can easily be mistaken to be the
beginning of a new patch header, as if the current message did not
have any patch text.

This 5-patch series is organized as follows:

 - patch 1 is about setting the baseline.  We need to recognise the
   patch header produced by format-patch, log, and diff-tree --stdin.

 - patch 2 to patch 4 are bit of code restructuring without changing
   the behaviour.

 - patch 5 stops looking for a patch header when we shouldn't, and
   adds tests.


Junio C Hamano (5):
  t4204: patch-id supports various input format
  patch-id: call flush_current_id() only when needed
  patch-id: make get_one_patchid() more extensible
  patch-id: rewrite code that detects the beginning of a patch
  patch-id: tighten code to detect the patch header

 builtin/patch-id.c  | 93 +++++++++++++++++++++++++++++++++++----------
 t/t4204-patch-id.sh | 40 +++++++++++++++++++
 2 files changed, 112 insertions(+), 21 deletions(-)

Range-diff against v1:
1:  c1ff38c0b8 = 1:  e68a30f6c9 t4204: patch-id supports various input format
2:  a201f344a6 = 2:  3afc63b210 patch-id: call flush_current_id() only when needed
3:  237f8910ca ! 3:  22dd5e7a5b patch-id: make get_one_patchid() more extensible
    @@ builtin/patch-id.c: static int scan_hunk_header(const char *p, int *p_before, in
      	return 1;
      }
      
    -+#define GOPID_STABLE   01
    -+#define GOPID_VERBATIM 02
    ++/*
    ++ * flag bits to control get_one_patchid()'s behaviour.
    ++ */
    ++enum {
    ++	GOPID_STABLE = (1<<0),		/* --stable */
    ++	GOPID_VERBATIM = (1<<1),	/* --verbatim */
    ++};
     +
      static int get_one_patchid(struct object_id *next_oid, struct object_id *result,
     -			   struct strbuf *line_buf, int stable, int verbatim)
4:  d6d068c9dc = 4:  0cca1ed513 patch-id: rewrite code that detects the beginning of a patch
5:  51af73722c ! 5:  ef422df7c1 patch-id: tighten code to detect the patch header
    @@ Commit message
      ## builtin/patch-id.c ##
     @@ builtin/patch-id.c: static int scan_hunk_header(const char *p, int *p_before, int *p_after)
      
    - #define GOPID_STABLE   01
    - #define GOPID_VERBATIM 02
    -+#define GOPID_FIND_HEADER 04
    + /*
    +  * flag bits to control get_one_patchid()'s behaviour.
    ++ *
    ++ * STABLE/VERBATIM are given from the command line option as
    ++ * --stable/--verbatim.  FIND_HEADER conveys the internal state
    ++ * maintained by the caller to allow the function to avoid mistaking
    ++ * lines of log message before seeing the "diff" part as the beginning
    ++ * of the next patch.
    +  */
    + enum {
    + 	GOPID_STABLE = (1<<0),		/* --stable */
    + 	GOPID_VERBATIM = (1<<1),	/* --verbatim */
    ++	GOPID_FIND_HEADER = (1<<2),	/* stop at the beginning of patch message */
    + };
      
      static int get_one_patchid(struct object_id *next_oid, struct object_id *result,
    - 			   struct strbuf *line_buf, unsigned flags)
    +@@ builtin/patch-id.c: static int get_one_patchid(struct object_id *next_oid, struct object_id *result,
      {
      	int stable = flags & GOPID_STABLE;
      	int verbatim = flags & GOPID_VERBATIM;

Comments

Patrick Steinhardt July 30, 2024, 5:12 a.m. UTC | #1
On Mon, Jul 29, 2024 at 06:17:33PM -0700, Junio C Hamano wrote:
>     Updates from v1
> 
>      - changed flag bits used internally from CPP macro to enum and
>        added a bit of comments.

The changes look like what I expected, so this series looks good to me
overall. Thanks!

Patrick