diff mbox series

[v5] receive-pack.c: consolidate find header logic

Message ID pull.1125.v5.git.git.1641430309837.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series [v5] receive-pack.c: consolidate find header logic | expand

Commit Message

John Cai Jan. 6, 2022, 12:51 a.m. UTC
From: John Cai <johncai86@gmail.com>

There are two functions that have very similar logic of finding a header
value. find_commit_header, and find_header. We can conslidate the logic
by introducing a new function find_header_mem, which is equivalent to
find_commit_header except it takes a len parameter that determines how
many bytes will be read. find_commit_header and find_header can then both
call find_header_mem.

This reduces duplicate logic, as the logic for finding header values
can now all live in one place.

Signed-off-by: John Cai <johncai86@gmail.com>
---
    Consolidate find_header logic into one function
    
    This addresses the NEEDSWORK comment in builtin/receive-pack.c:
    
     /**
       * NEEDSWORK: reuse find_commit_header() from jk/commit-author-parsing
       * after dropping "_commit" from its name and possibly moving it out
       * of commit.c
       **/
    
    
    There are two functions that have very similar logic of finding a header
    value. find_commit_header, and find_header. We can conslidate the logic
    by introducing a new function find_header_mem, which is equivalent to
    find_commit_header except it takes a len parameter that determines how
    many bytes will be read. find_commit_header and find_header can then
    both call find_header_mem.
    
    This reduces duplicate logic, as the logic for finding header values can
    now all live in one place.
    
    Changes since v4:
    
     * adjust verbiage of NEEDSWORK comment block
    
    Changes since v3:
    
     * added NEEDSWORK block detailing what needs to be done to clean up
       find_header_mem
     * fixed verbiage in commit message
     * adjusted style of an if block (based on Junio's feedback)

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1125%2Fjohn-cai%2Fjc%2Freplace-find-header-v5
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1125/john-cai/jc/replace-find-header-v5
Pull-Request: https://github.com/git/git/pull/1125

Range-diff vs v4:

 1:  a7b00022b00 ! 1:  02da3136c43 receive-pack.c: consolidate find header logic
     @@ commit.c: struct commit_list **commit_list_append(struct commit *commit,
       
      -	while (line) {
      +	/*
     -+	 * NEEDSWORK: Between line[0] and msg[len], there may not be a LF nor NUL
     -+	 * at all, and strchrnul() will scan beyond the range we were given
     -+	 * Make this operation safer and abide by the contract to only read up to len.
     ++	 * NEEDSWORK: It's possible for strchrnul() to scan beyond the range
     ++	 * given by len. However, current callers are safe because they compute
     ++	 * len by scanning a NUL-terminated block of memory starting at msg.
     ++	 * Make this function safer by checking if the input is NUL-terminated
     ++	 * or has a NL between line[0] and msg[len].
      +	 */
      +	while (line && line < msg + len) {
       		const char *eol = strchrnul(line, '\n');


 builtin/receive-pack.c | 33 ++++++++++-----------------------
 commit.c               | 16 ++++++++++++++--
 commit.h               |  5 +++++
 3 files changed, 29 insertions(+), 25 deletions(-)


base-commit: c8b2ade48c204690119936ada89cd938c476c5c2

Comments

Junio C Hamano Jan. 6, 2022, 7:40 p.m. UTC | #1
"John Cai via GitGitGadget" <gitgitgadget@gmail.com> writes:

>      ++	 * NEEDSWORK: It's possible for strchrnul() to scan beyond the range
>      ++	 * given by len. However, current callers are safe because they compute
>      ++	 * len by scanning a NUL-terminated block of memory starting at msg.

OK.  Because the caller didn't die with SIGBUS owhen they scanned
the memory to figure out len, we know scanning between msg[0] and
msg[len] is safe.

>      ++	 * Make this function safer by checking if the input is NUL-terminated
>      ++	 * or has a NL between line[0] and msg[len].

This comment however feels wrong.  The function is _safe_ if the
input is NUL-terminated already.  And it is very expected that the
msg[] has LF between msg[0] and msg[len]---after all that is why we
use strchrnul() to scan it.

Perhaps replace it with something like "But nevertheless it would be
better to make sure the function does not look at msg beyond len
that was provided by the caller", perhaps?

The most problematic case happens when msg points into somewhere
inside a page and &msg[len] is still inside that page, but (1) there
is no LF nor NUL in the page after &msg[0], and (2) the page after
the page &msg[0] and &msg[len] are in is not mapped.  Letting
strchrnul() scan msg[] beyond len will lead us to attempt to read
the first byte of the next page and killed with SIGBUS (I recall
that I had to debug a mysterious breakage in strlen or strchr on
SunOS, which tried to grab 8-byte at a time and crossed page
boundary to get an unnecessary SIGBUS).
diff mbox series

Patch

diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index 9f4a0b816cf..5c2732a0d07 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -581,32 +581,19 @@  static char *prepare_push_cert_nonce(const char *path, timestamp_t stamp)
 	return strbuf_detach(&buf, NULL);
 }
 
-/*
- * NEEDSWORK: reuse find_commit_header() from jk/commit-author-parsing
- * after dropping "_commit" from its name and possibly moving it out
- * of commit.c
- */
 static char *find_header(const char *msg, size_t len, const char *key,
 			 const char **next_line)
 {
-	int key_len = strlen(key);
-	const char *line = msg;
-
-	while (line && line < msg + len) {
-		const char *eol = strchrnul(line, '\n');
-
-		if ((msg + len <= eol) || line == eol)
-			return NULL;
-		if (line + key_len < eol &&
-		    !memcmp(line, key, key_len) && line[key_len] == ' ') {
-			int offset = key_len + 1;
-			if (next_line)
-				*next_line = *eol ? eol + 1 : eol;
-			return xmemdupz(line + offset, (eol - line) - offset);
-		}
-		line = *eol ? eol + 1 : NULL;
-	}
-	return NULL;
+	size_t out_len;
+	const char *val = find_header_mem(msg, len, key, &out_len);
+
+	if (!val)
+		return NULL;
+
+	if (next_line)
+		*next_line = val + out_len + 1;
+
+	return xmemdupz(val, out_len);
 }
 
 /*
diff --git a/commit.c b/commit.c
index a348f085b2b..f5224e65de2 100644
--- a/commit.c
+++ b/commit.c
@@ -1631,12 +1631,20 @@  struct commit_list **commit_list_append(struct commit *commit,
 	return &new_commit->next;
 }
 
-const char *find_commit_header(const char *msg, const char *key, size_t *out_len)
+const char *find_header_mem(const char *msg, size_t len,
+			const char *key, size_t *out_len)
 {
 	int key_len = strlen(key);
 	const char *line = msg;
 
-	while (line) {
+	/*
+	 * NEEDSWORK: It's possible for strchrnul() to scan beyond the range
+	 * given by len. However, current callers are safe because they compute
+	 * len by scanning a NUL-terminated block of memory starting at msg.
+	 * Make this function safer by checking if the input is NUL-terminated
+	 * or has a NL between line[0] and msg[len].
+	 */
+	while (line && line < msg + len) {
 		const char *eol = strchrnul(line, '\n');
 
 		if (line == eol)
@@ -1653,6 +1661,10 @@  const char *find_commit_header(const char *msg, const char *key, size_t *out_len
 	return NULL;
 }
 
+const char *find_commit_header(const char *msg, const char *key, size_t *out_len)
+{
+	return find_header_mem(msg, strlen(msg), key, out_len);
+}
 /*
  * Inspect the given string and determine the true "end" of the log message, in
  * order to find where to put a new Signed-off-by trailer.  Ignored are
diff --git a/commit.h b/commit.h
index 3ea32766bcb..38cc5426615 100644
--- a/commit.h
+++ b/commit.h
@@ -290,12 +290,17 @@  void free_commit_extra_headers(struct commit_extra_header *extra);
 
 /*
  * Search the commit object contents given by "msg" for the header "key".
+ * Reads up to "len" bytes of "msg".
  * Returns a pointer to the start of the header contents, or NULL. The length
  * of the header, up to the first newline, is returned via out_len.
  *
  * Note that some headers (like mergetag) may be multi-line. It is the caller's
  * responsibility to parse further in this case!
  */
+const char *find_header_mem(const char *msg, size_t len,
+			const char *key,
+			size_t *out_len);
+
 const char *find_commit_header(const char *msg, const char *key,
 			       size_t *out_len);