diff mbox series

[1/4] version: refactor strbuf_sanitize()

Message ID 20240731134014.2299361-2-christian.couder@gmail.com (mailing list archive)
State New, archived
Headers show
Series Introduce a "promisor-remote" capability | expand

Commit Message

Christian Couder July 31, 2024, 1:40 p.m. UTC
The git_user_agent_sanitized() function performs some sanitizing to
avoid special characters being sent over the line and possibly messing
up with the protocol or with the parsing on the other side.

Let's extract this sanitizing into a new strbuf_sanitize() function, as
we will want to reuse it in a following patch, and let's put it into
strbuf.{c,h}.

While at it, let's also make a few small improvements:
  - use 'size_t' for 'i' instead of 'int',
  - move the declaration of 'i' inside the 'for ( ... )',
  - use strbuf_detach() to explicitely detach the string contained by
    the 'sb' strbuf.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 strbuf.c  | 9 +++++++++
 strbuf.h  | 7 +++++++
 version.c | 9 ++-------
 3 files changed, 18 insertions(+), 7 deletions(-)

Comments

Junio C Hamano July 31, 2024, 5:18 p.m. UTC | #1
Christian Couder <christian.couder@gmail.com> writes:

> diff --git a/strbuf.c b/strbuf.c
> index 3d2189a7f6..cccfdec0e3 100644
> --- a/strbuf.c
> +++ b/strbuf.c
> @@ -1082,3 +1082,12 @@ void strbuf_strip_file_from_path(struct strbuf *sb)
>  	char *path_sep = find_last_dir_sep(sb->buf);
>  	strbuf_setlen(sb, path_sep ? path_sep - sb->buf + 1 : 0);
>  }
> +
> +void strbuf_sanitize(struct strbuf *sb)
> +{
> +	strbuf_trim(sb);
> +	for (size_t i = 0; i < sb->len; i++) {
> +		if (sb->buf[i] <= 32 || sb->buf[i] >= 127)
> +			sb->buf[i] = '.';
> +	}
> +}

This looked a bit _too_ specific for the use of the transport layer
(which raises the question if it should even live in strbuf.[ch]).
It also made me wonder if different callers likely want to have
different variants (e.g., do not trim, only trim at the tail, squash
a run of unprintables into a single '.', use '?'  instead of '.',
etc., etc.).

It turns out that there is only *one* existing caller that gets
replaced with this "common" version, which made it a Meh to me.

Let's hope that there will be many new callers to make this step
worthwhile.

>  __attribute__((format (printf,1,2)))
>  int printf_ln(const char *fmt, ...);
>  __attribute__((format (printf,2,3)))
> diff --git a/version.c b/version.c
> index 41b718c29e..951e6dca74 100644
> --- a/version.c
> +++ b/version.c
> @@ -24,15 +24,10 @@ const char *git_user_agent_sanitized(void)
>  
>  	if (!agent) {
>  		struct strbuf buf = STRBUF_INIT;
> -		int i;
>  
>  		strbuf_addstr(&buf, git_user_agent());
> -		strbuf_trim(&buf);
> -		for (i = 0; i < buf.len; i++) {
> -			if (buf.buf[i] <= 32 || buf.buf[i] >= 127)
> -				buf.buf[i] = '.';
> -		}
> -		agent = buf.buf;
> +		strbuf_sanitize(&buf);
> +		agent = strbuf_detach(&buf, NULL);
>  	}
>  
>  	return agent;
Christian Couder Aug. 20, 2024, 11:29 a.m. UTC | #2
On Wed, Jul 31, 2024 at 7:18 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Christian Couder <christian.couder@gmail.com> writes:
>
> > diff --git a/strbuf.c b/strbuf.c
> > index 3d2189a7f6..cccfdec0e3 100644
> > --- a/strbuf.c
> > +++ b/strbuf.c
> > @@ -1082,3 +1082,12 @@ void strbuf_strip_file_from_path(struct strbuf *sb)
> >       char *path_sep = find_last_dir_sep(sb->buf);
> >       strbuf_setlen(sb, path_sep ? path_sep - sb->buf + 1 : 0);
> >  }
> > +
> > +void strbuf_sanitize(struct strbuf *sb)
> > +{
> > +     strbuf_trim(sb);
> > +     for (size_t i = 0; i < sb->len; i++) {
> > +             if (sb->buf[i] <= 32 || sb->buf[i] >= 127)
> > +                     sb->buf[i] = '.';
> > +     }
> > +}
>
> This looked a bit _too_ specific for the use of the transport layer
> (which raises the question if it should even live in strbuf.[ch]).
> It also made me wonder if different callers likely want to have
> different variants (e.g., do not trim, only trim at the tail, squash
> a run of unprintables into a single '.', use '?'  instead of '.',
> etc., etc.).
>
> It turns out that there is only *one* existing caller that gets
> replaced with this "common" version, which made it a Meh to me.
>
> Let's hope that there will be many new callers to make this step
> worthwhile.

A very similar step was also part of my previous patch series to add
an OS version to the protocol. See:

https://lore.kernel.org/git/20240619125708.3719150-2-christian.couder@gmail.com/

My opinion is that the code is doing something often needed when
dealing with the protocol, so it is worth it to refactor that code
soon, and then adapt it later when needed with options (to not trim,
only trim at the tail, use '?'  instead of '.', etc).

I am not sure if it should live in strbuf.[ch], but on the other hand
if we indeed adapt it over time with a number of options for different
use cases, it might end up in strbuf.[ch], so it is a reasonable bet
to put it there right away. I must also say that I don't know which
other place(s) would be a good home for it.
diff mbox series

Patch

diff --git a/strbuf.c b/strbuf.c
index 3d2189a7f6..cccfdec0e3 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -1082,3 +1082,12 @@  void strbuf_strip_file_from_path(struct strbuf *sb)
 	char *path_sep = find_last_dir_sep(sb->buf);
 	strbuf_setlen(sb, path_sep ? path_sep - sb->buf + 1 : 0);
 }
+
+void strbuf_sanitize(struct strbuf *sb)
+{
+	strbuf_trim(sb);
+	for (size_t i = 0; i < sb->len; i++) {
+		if (sb->buf[i] <= 32 || sb->buf[i] >= 127)
+			sb->buf[i] = '.';
+	}
+}
diff --git a/strbuf.h b/strbuf.h
index 003f880ff7..884157873e 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -664,6 +664,13 @@  typedef int (*char_predicate)(char ch);
 void strbuf_addstr_urlencode(struct strbuf *sb, const char *name,
 			     char_predicate allow_unencoded_fn);
 
+/*
+ * Trim and replace each character with ascii code below 32 or above
+ * 127 (included) using a dot '.' character. Useful for sending
+ * capabilities.
+ */
+void strbuf_sanitize(struct strbuf *sb);
+
 __attribute__((format (printf,1,2)))
 int printf_ln(const char *fmt, ...);
 __attribute__((format (printf,2,3)))
diff --git a/version.c b/version.c
index 41b718c29e..951e6dca74 100644
--- a/version.c
+++ b/version.c
@@ -24,15 +24,10 @@  const char *git_user_agent_sanitized(void)
 
 	if (!agent) {
 		struct strbuf buf = STRBUF_INIT;
-		int i;
 
 		strbuf_addstr(&buf, git_user_agent());
-		strbuf_trim(&buf);
-		for (i = 0; i < buf.len; i++) {
-			if (buf.buf[i] <= 32 || buf.buf[i] >= 127)
-				buf.buf[i] = '.';
-		}
-		agent = buf.buf;
+		strbuf_sanitize(&buf);
+		agent = strbuf_detach(&buf, NULL);
 	}
 
 	return agent;