diff mbox series

[v3,1/5] version: refactor strbuf_sanitize()

Message ID 20241206124248.160494-2-christian.couder@gmail.com (mailing list archive)
State New
Headers show
Series Introduce a "promisor-remote" capability | expand

Commit Message

Christian Couder Dec. 6, 2024, 12:42 p.m. UTC
The git_user_agent_sanitized() function performs some sanitizing to
avoid special characters being sent over the line and possibly messing
up with the protocol or with the parsing on the other side.

Let's extract this sanitizing into a new strbuf_sanitize() function, as
we will want to reuse it in a following patch, and let's put it into
strbuf.{c,h}.

While at it, let's also make a few small improvements:
  - use 'size_t' for 'i' instead of 'int',
  - move the declaration of 'i' inside the 'for ( ... )',
  - use strbuf_detach() to explicitly detach the string contained by
    the 'sb' strbuf.

Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 strbuf.c  | 9 +++++++++
 strbuf.h  | 7 +++++++
 version.c | 9 ++-------
 3 files changed, 18 insertions(+), 7 deletions(-)

Comments

Junio C Hamano Dec. 7, 2024, 6:21 a.m. UTC | #1
Christian Couder <christian.couder@gmail.com> writes:

> +/*
> + * Trim and replace each character with ascii code below 32 or above
> + * 127 (included) using a dot '.' character. Useful for sending
> + * capabilities.
> + */
> +void strbuf_sanitize(struct strbuf *sb);

I am not getting "Useful for sending capabilities" here, and feel
that it is somewhat an unsubstantiated claim.  If some information
is going to be transferred (which the phrase "sending capabilities"
hints), I'd expect that we try as hard as possible not to lose
information, but redact-non-ASCII is the total opposite of "not
losing information".

> diff --git a/version.c b/version.c
> index 41b718c29e..951e6dca74 100644
> --- a/version.c
> +++ b/version.c
> @@ -24,15 +24,10 @@ const char *git_user_agent_sanitized(void)
>  
>  	if (!agent) {
>  		struct strbuf buf = STRBUF_INIT;
> -		int i;
>  
>  		strbuf_addstr(&buf, git_user_agent());
> -		strbuf_trim(&buf);
> -		for (i = 0; i < buf.len; i++) {
> -			if (buf.buf[i] <= 32 || buf.buf[i] >= 127)
> -				buf.buf[i] = '.';
> -		}
> -		agent = buf.buf;
> +		strbuf_sanitize(&buf);
> +		agent = strbuf_detach(&buf, NULL);
>  	}
>  
>  	return agent;

This is very faithful rewrite of the original.  The original had a
strbuf on stack, and after creating user-agent string in it, a
function scope static variable "agent" is made to point at it and
then the stack the strbuf was on is allowed to go out of scope.
Since the variable "agent" is holding onto the piece of memory, the
leak checker does not complain about anything.  The rewritten
version is leak-free for exactly the same reason, but because it
calls strbuf_detach() before the strbuf goes out of scope to
officially transfer the ownership to the variable "agent", it tells
what is going on to readers a lot more clearly.

Nicely done.

By the way, as we are trimming, I am very very much tempted to
squish a run of non-ASCII bytes into one dot, perhaps like

	void redact_non_printables(struct strbuf *sb)
	{
		size_t dst = 0;
                int skipped = 0;

        	strbuf_trim(sb);
		for (size_t src = 0; src < sb->len; src++) {
                	int ch = sb->buf[src];
			if (ch <= 32 && 127 <= ch) {
                                if (skipped)
                                	continue;
                        	ch = '.';
			}
                        sb->buf[dst++] = ch;
                        skipped = (ch == '.');
		}
	}

or even without strbuf_trim(), which would turn any leading or
trailing run of whitespaces into '.'.

But that is an improvement that can be easily done on top after the
dust settles and better left as #leftoverbits material.
diff mbox series

Patch

diff --git a/strbuf.c b/strbuf.c
index 3d2189a7f6..cccfdec0e3 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -1082,3 +1082,12 @@  void strbuf_strip_file_from_path(struct strbuf *sb)
 	char *path_sep = find_last_dir_sep(sb->buf);
 	strbuf_setlen(sb, path_sep ? path_sep - sb->buf + 1 : 0);
 }
+
+void strbuf_sanitize(struct strbuf *sb)
+{
+	strbuf_trim(sb);
+	for (size_t i = 0; i < sb->len; i++) {
+		if (sb->buf[i] <= 32 || sb->buf[i] >= 127)
+			sb->buf[i] = '.';
+	}
+}
diff --git a/strbuf.h b/strbuf.h
index 003f880ff7..884157873e 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -664,6 +664,13 @@  typedef int (*char_predicate)(char ch);
 void strbuf_addstr_urlencode(struct strbuf *sb, const char *name,
 			     char_predicate allow_unencoded_fn);
 
+/*
+ * Trim and replace each character with ascii code below 32 or above
+ * 127 (included) using a dot '.' character. Useful for sending
+ * capabilities.
+ */
+void strbuf_sanitize(struct strbuf *sb);
+
 __attribute__((format (printf,1,2)))
 int printf_ln(const char *fmt, ...);
 __attribute__((format (printf,2,3)))
diff --git a/version.c b/version.c
index 41b718c29e..951e6dca74 100644
--- a/version.c
+++ b/version.c
@@ -24,15 +24,10 @@  const char *git_user_agent_sanitized(void)
 
 	if (!agent) {
 		struct strbuf buf = STRBUF_INIT;
-		int i;
 
 		strbuf_addstr(&buf, git_user_agent());
-		strbuf_trim(&buf);
-		for (i = 0; i < buf.len; i++) {
-			if (buf.buf[i] <= 32 || buf.buf[i] >= 127)
-				buf.buf[i] = '.';
-		}
-		agent = buf.buf;
+		strbuf_sanitize(&buf);
+		agent = strbuf_detach(&buf, NULL);
 	}
 
 	return agent;