Message ID | 20241206124248.160494-2-christian.couder@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Introduce a "promisor-remote" capability | expand |
Christian Couder <christian.couder@gmail.com> writes: > +/* > + * Trim and replace each character with ascii code below 32 or above > + * 127 (included) using a dot '.' character. Useful for sending > + * capabilities. > + */ > +void strbuf_sanitize(struct strbuf *sb); I am not getting "Useful for sending capabilities" here, and feel that it is somewhat an unsubstantiated claim. If some information is going to be transferred (which the phrase "sending capabilities" hints), I'd expect that we try as hard as possible not to lose information, but redact-non-ASCII is the total opposite of "not losing information". > diff --git a/version.c b/version.c > index 41b718c29e..951e6dca74 100644 > --- a/version.c > +++ b/version.c > @@ -24,15 +24,10 @@ const char *git_user_agent_sanitized(void) > > if (!agent) { > struct strbuf buf = STRBUF_INIT; > - int i; > > strbuf_addstr(&buf, git_user_agent()); > - strbuf_trim(&buf); > - for (i = 0; i < buf.len; i++) { > - if (buf.buf[i] <= 32 || buf.buf[i] >= 127) > - buf.buf[i] = '.'; > - } > - agent = buf.buf; > + strbuf_sanitize(&buf); > + agent = strbuf_detach(&buf, NULL); > } > > return agent; This is very faithful rewrite of the original. The original had a strbuf on stack, and after creating user-agent string in it, a function scope static variable "agent" is made to point at it and then the stack the strbuf was on is allowed to go out of scope. Since the variable "agent" is holding onto the piece of memory, the leak checker does not complain about anything. The rewritten version is leak-free for exactly the same reason, but because it calls strbuf_detach() before the strbuf goes out of scope to officially transfer the ownership to the variable "agent", it tells what is going on to readers a lot more clearly. Nicely done. By the way, as we are trimming, I am very very much tempted to squish a run of non-ASCII bytes into one dot, perhaps like void redact_non_printables(struct strbuf *sb) { size_t dst = 0; int skipped = 0; strbuf_trim(sb); for (size_t src = 0; src < sb->len; src++) { int ch = sb->buf[src]; if (ch <= 32 && 127 <= ch) { if (skipped) continue; ch = '.'; } sb->buf[dst++] = ch; skipped = (ch == '.'); } } or even without strbuf_trim(), which would turn any leading or trailing run of whitespaces into '.'. But that is an improvement that can be easily done on top after the dust settles and better left as #leftoverbits material.
diff --git a/strbuf.c b/strbuf.c index 3d2189a7f6..cccfdec0e3 100644 --- a/strbuf.c +++ b/strbuf.c @@ -1082,3 +1082,12 @@ void strbuf_strip_file_from_path(struct strbuf *sb) char *path_sep = find_last_dir_sep(sb->buf); strbuf_setlen(sb, path_sep ? path_sep - sb->buf + 1 : 0); } + +void strbuf_sanitize(struct strbuf *sb) +{ + strbuf_trim(sb); + for (size_t i = 0; i < sb->len; i++) { + if (sb->buf[i] <= 32 || sb->buf[i] >= 127) + sb->buf[i] = '.'; + } +} diff --git a/strbuf.h b/strbuf.h index 003f880ff7..884157873e 100644 --- a/strbuf.h +++ b/strbuf.h @@ -664,6 +664,13 @@ typedef int (*char_predicate)(char ch); void strbuf_addstr_urlencode(struct strbuf *sb, const char *name, char_predicate allow_unencoded_fn); +/* + * Trim and replace each character with ascii code below 32 or above + * 127 (included) using a dot '.' character. Useful for sending + * capabilities. + */ +void strbuf_sanitize(struct strbuf *sb); + __attribute__((format (printf,1,2))) int printf_ln(const char *fmt, ...); __attribute__((format (printf,2,3))) diff --git a/version.c b/version.c index 41b718c29e..951e6dca74 100644 --- a/version.c +++ b/version.c @@ -24,15 +24,10 @@ const char *git_user_agent_sanitized(void) if (!agent) { struct strbuf buf = STRBUF_INIT; - int i; strbuf_addstr(&buf, git_user_agent()); - strbuf_trim(&buf); - for (i = 0; i < buf.len; i++) { - if (buf.buf[i] <= 32 || buf.buf[i] >= 127) - buf.buf[i] = '.'; - } - agent = buf.buf; + strbuf_sanitize(&buf); + agent = strbuf_detach(&buf, NULL); } return agent;
The git_user_agent_sanitized() function performs some sanitizing to avoid special characters being sent over the line and possibly messing up with the protocol or with the parsing on the other side. Let's extract this sanitizing into a new strbuf_sanitize() function, as we will want to reuse it in a following patch, and let's put it into strbuf.{c,h}. While at it, let's also make a few small improvements: - use 'size_t' for 'i' instead of 'int', - move the declaration of 'i' inside the 'for ( ... )', - use strbuf_detach() to explicitly detach the string contained by the 'sb' strbuf. Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Christian Couder <chriscool@tuxfamily.org> --- strbuf.c | 9 +++++++++ strbuf.h | 7 +++++++ version.c | 9 ++------- 3 files changed, 18 insertions(+), 7 deletions(-)