diff mbox series

cache-tree: fix strbuf growth in prime_cache_tree_rec()

Message ID ff3ac119-9b00-746f-470c-8db18c9c61a1@web.de (mailing list archive)
State Superseded
Headers show
Series cache-tree: fix strbuf growth in prime_cache_tree_rec() | expand

Commit Message

René Scharfe Feb. 4, 2023, 7:10 p.m. UTC
Use size_t to store the original length of the strbuf tree_len, as
that's the correct type.

Don't double the allocated size of the strbuf when adding a subdirectory
name.  Only extend it to fit that name and a slash.

Signed-off-by: René Scharfe <l.s.r@web.de>
---
 cache-tree.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--
2.39.1

Comments

Ævar Arnfjörð Bjarmason Feb. 5, 2023, 9:12 p.m. UTC | #1
On Sat, Feb 04 2023, René Scharfe wrote:

> Use size_t to store the original length of the strbuf tree_len, as
> that's the correct type.
>
> Don't double the allocated size of the strbuf when adding a subdirectory
> name.  Only extend it to fit that name and a slash.
>
> Signed-off-by: René Scharfe <l.s.r@web.de>
> ---
>  cache-tree.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/cache-tree.c b/cache-tree.c
> index 9af457f47c..35f7617164 100644
> --- a/cache-tree.c
> +++ b/cache-tree.c
> @@ -760,7 +760,7 @@ static void prime_cache_tree_rec(struct repository *r,
>  	struct tree_desc desc;
>  	struct name_entry entry;
>  	int cnt;
> -	int base_path_len = tree_path->len;
> +	size_t base_path_len = tree_path->len;
>
>  	oidcpy(&it->oid, &tree->object.oid);
>
> @@ -785,7 +785,7 @@ static void prime_cache_tree_rec(struct repository *r,
>  			 */
>  			if (r->index->sparse_index) {
>  				strbuf_setlen(tree_path, base_path_len);
> -				strbuf_grow(tree_path, base_path_len + entry.pathlen + 1);
> +				strbuf_grow(tree_path, entry.pathlen + 1);
>  				strbuf_add(tree_path, entry.path, entry.pathlen);
>  				strbuf_addch(tree_path, '/');
>  			}

The size_t conversion is trivially correct.

But what do you mean with "don't double the[...]"? Do you mean that this
manages to evade growing these to 24 etc?

One wonders if (even for this index-related code) we really need such
careful management of growth, and could instead do with:

	strbuf_setlen(tree_path, base_path_len);
	strbuf_add(tree_path, entry.path, entry.pathlen);
	strbuf_addch(tree_path, '/');

Or even just:

	strbuf_addf(tree_path, "%*.s/", (int)entry.pathlen, entry.path);

As e.g. the t1092 test that runs this codepath shows we're looping
through the index entires here and will (in that case) handle names & a
"tree_path" like (these are the entry.path values):

	""
	"before"

Which here are turned into:

	"before/"

But both before & after your change we'll grow the "alloc" to 24, which
isn't surprising, as both the string length of (sans slash) 6 and 7 plus
1 for "\0" is rounde up to 24 with the standard growth pattern.

So I think if we're doing a "while-at-it" fixing of the off-by-one here
we might be better of asking whether it's needed at all, and whether
this case can't just be left to the growth smartness of the
strbuf+ALLOC_GROW() API instead.
Derrick Stolee Feb. 6, 2023, 3:27 p.m. UTC | #2
On 2/5/2023 4:12 PM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Sat, Feb 04 2023, René Scharfe wrote:
> 
>> Use size_t to store the original length of the strbuf tree_len, as
>> that's the correct type.
>>
>> Don't double the allocated size of the strbuf when adding a subdirectory
>> name.  Only extend it to fit that name and a slash.
>>
>> Signed-off-by: René Scharfe <l.s.r@web.de>
>> ---
>>  cache-tree.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/cache-tree.c b/cache-tree.c
>> index 9af457f47c..35f7617164 100644
>> --- a/cache-tree.c
>> +++ b/cache-tree.c
>> @@ -760,7 +760,7 @@ static void prime_cache_tree_rec(struct repository *r,
>>  	struct tree_desc desc;
>>  	struct name_entry entry;
>>  	int cnt;
>> -	int base_path_len = tree_path->len;
>> +	size_t base_path_len = tree_path->len;
>>
>>  	oidcpy(&it->oid, &tree->object.oid);
>>
>> @@ -785,7 +785,7 @@ static void prime_cache_tree_rec(struct repository *r,
>>  			 */
>>  			if (r->index->sparse_index) {
>>  				strbuf_setlen(tree_path, base_path_len);
>> -				strbuf_grow(tree_path, base_path_len + entry.pathlen + 1);
>> +				strbuf_grow(tree_path, entry.pathlen + 1);
>>  				strbuf_add(tree_path, entry.path, entry.pathlen);
>>  				strbuf_addch(tree_path, '/');
>>  			}
> 
> The size_t conversion is trivially correct.

I agree, and thanks for finding and fixing this issue.

Upon reading strbuf_grow(), I would expect it to work the same
as ALLOC_GROW(), but its documentation clearly states a very
different result.

> One wonders if (even for this index-related code) we really need such
> careful management of growth, and could instead do with:
> 
> 	strbuf_setlen(tree_path, base_path_len);
> 	strbuf_add(tree_path, entry.path, entry.pathlen);
> 	strbuf_addch(tree_path, '/');

This would be my preferred way to go here.

> Or even just:
> 
> 	strbuf_addf(tree_path, "%*.s/", (int)entry.pathlen, entry.path);

Please do not add "addf" functions that can be run in tight loops.
It's faster to do strbuf_add() followed by strbuf_addch().

Thanks,
-Stolee
Ævar Arnfjörð Bjarmason Feb. 6, 2023, 4:18 p.m. UTC | #3
On Mon, Feb 06 2023, Derrick Stolee wrote:

> On 2/5/2023 4:12 PM, Ævar Arnfjörð Bjarmason wrote:
> [...]
>> One wonders if (even for this index-related code) we really need such
>> careful management of growth, and could instead do with:
>> 
>> 	strbuf_setlen(tree_path, base_path_len);
>> 	strbuf_add(tree_path, entry.path, entry.pathlen);
>> 	strbuf_addch(tree_path, '/');
>
> This would be my preferred way to go here.

*nod*

>> Or even just:
>> 
>> 	strbuf_addf(tree_path, "%*.s/", (int)entry.pathlen, entry.path);
>
> Please do not add "addf" functions that can be run in tight loops.
> It's faster to do strbuf_add() followed by strbuf_addch().

Good point.

I wondered just how much slower, and it's up to 3x! At least according
to this[1] artificial test case (where I usurped a random test helper).

I wondered if we could just handle some common strbuf_addf() cases
ourselves, and the benchmark shows (manually annotated, too lazy to set
up the -n option):

	git hyperfine -L rev HEAD~5,HEAD~4,HEAD~3,HEAD~2,HEAD~1,HEAD~0 -s 'make CFLAGS=-O3' './t/helper/test-tool online-cpus' -r 3
	[...]
	Summary
	  './t/helper/test-tool online-cpus' in 'HEAD~0' ran <== strbuf_add() + strbuf_addch()
	    1.06 ± 0.11 times faster than './t/helper/test-tool online-cpus' in 'HEAD~1' <== strbuf_addstr() + strbuf_addch()
	    1.18 ± 0.12 times faster than './t/helper/test-tool online-cpus' in 'HEAD~4' <== hand optimized strbuf_addf() for "%sC"
	    1.33 ± 0.18 times faster than './t/helper/test-tool online-cpus' in 'HEAD~2' <== hand optimized strbuf_addf() for "%*sC"
	    2.63 ± 0.05 times faster than './t/helper/test-tool online-cpus' in 'HEAD~5' <== strbuf_addf("%s/")
	    2.92 ± 0.25 times faster than './t/helper/test-tool online-cpus' in 'HEAD~3' <== strbuf_addf("%*s/")

The "hand optimization" just being a very stupid handling of "%sC" for
arbitrary values of a single char "C", and ditto for "%*sC" (which
curiously is slower here).

So, for truly hot loops we'd still want to use the add + addch, but if
anyone's interested (hashtag #leftoverbits) it looks like we could get
some easy wins (and reduction in code size, as we could stop worrying
about addf being slow in most cases) with some very dumb minimal
vaddf(), which could handle these cases (but not anything involving
padding etc.).

I didn't dig, but wouldn't be surprised if the reason is that C
libraries need to carry a relatively fat & general sprintf() for all
those edge cases, locale handling etc, whereas most of our use could
trivially be represented as some sequence of addstr()/addf() etc.

Another interesting approach (and this is very #leftoverbits) would be
to perform the same optimization with coccinelle.

I.e. our current use of it is purely "this code X should be written like
Y, and we should commit Y".

But there's no reason for why we couldn't effectively implement our own
compiler optimizations for our own APIs with it, so just grab "%s/" etc,
unpack that in OCaml, then emit strbuf_add() + strbuf_addch(), and that
would be what the C compiler would see.

1.
	
	9d23ffb1117 addf + nolen
	diff --git a/t/helper/test-online-cpus.c b/t/helper/test-online-cpus.c
	index 8cb0d53840f..c802ec579d0 100644
	--- a/t/helper/test-online-cpus.c
	+++ b/t/helper/test-online-cpus.c
	@@ -1,9 +1,17 @@
	 #include "test-tool.h"
	 #include "git-compat-util.h"
	 #include "thread-utils.h"
*	+#include "strbuf.h"
	 
	 int cmd__online_cpus(int argc, const char **argv)
	 {
	-	printf("%d\n", online_cpus());
	+	struct strbuf sb = STRBUF_INIT;
	+	const char *const str = "Hello, World";
	+
	+	for (size_t i = 0; i < 10000000; i++) {
	+		strbuf_reset(&sb);
	+		strbuf_addf(&sb, "%s/", str);
	+		puts(sb.buf);
	+	}
	 	return 0;
	 }
	9f74eff5623 addf + nolen optimize
	diff --git a/strbuf.c b/strbuf.c
	index c383f41a3c5..750e5e6a5b4 100644
	--- a/strbuf.c
	+++ b/strbuf.c
	@@ -332,8 +332,16 @@ void strbuf_addchars(struct strbuf *sb, int c, size_t n)
	 void strbuf_addf(struct strbuf *sb, const char *fmt, ...)
	 {
	 	va_list ap;
	+
	 	va_start(ap, fmt);
	-	strbuf_vaddf(sb, fmt, ap);
	+	if (*fmt == '%' && *(fmt + 1) == 's' && *(fmt + 2) && !*(fmt + 3)) {
	+		const char *arg = va_arg(ap, const char *);
	+
	+		strbuf_addstr(sb, arg);
	+		strbuf_addch(sb, *(fmt + 2));
	+	} else {
	+		strbuf_vaddf(sb, fmt, ap);
	+	}
	 	va_end(ap);
	 }
	 
	ca60bb9b479 addf + len
	diff --git a/t/helper/test-online-cpus.c b/t/helper/test-online-cpus.c
	index c802ec579d0..7257e622015 100644
	--- a/t/helper/test-online-cpus.c
	+++ b/t/helper/test-online-cpus.c
	@@ -7,10 +7,11 @@ int cmd__online_cpus(int argc, const char **argv)
	 {
	 	struct strbuf sb = STRBUF_INIT;
	 	const char *const str = "Hello, World";
	+	const size_t len = strlen(str);
	 
	 	for (size_t i = 0; i < 10000000; i++) {
	 		strbuf_reset(&sb);
	-		strbuf_addf(&sb, "%s/", str);
	+		strbuf_addf(&sb, "%*s/", (int)len, str);
	 		puts(sb.buf);
	 	}
	 	return 0;
	1f47987d095 addf + len optimize
	diff --git a/strbuf.c b/strbuf.c
	index 750e5e6a5b4..88801268f7a 100644
	--- a/strbuf.c
	+++ b/strbuf.c
	@@ -334,11 +334,16 @@ void strbuf_addf(struct strbuf *sb, const char *fmt, ...)
	 	va_list ap;
	 
	 	va_start(ap, fmt);
	-	if (*fmt == '%' && *(fmt + 1) == 's' && *(fmt + 2) && !*(fmt + 3)) {
	+	if (*fmt == '%' &&
	+	    *(fmt + 1) == '*' &&
	+	    *(fmt + 2) == 's' &&
	+	    *(fmt + 3) &&
	+	    !*(fmt + 4)) {
	+		int len = va_arg(ap, int);
	 		const char *arg = va_arg(ap, const char *);
	 
	-		strbuf_addstr(sb, arg);
	-		strbuf_addch(sb, *(fmt + 2));
	+		strbuf_add(sb, arg, len);
	+		strbuf_addch(sb, *(fmt + 3));
	 	} else {
	 		strbuf_vaddf(sb, fmt, ap);
	 	}
	55c698c0b95 addstr
	diff --git a/t/helper/test-online-cpus.c b/t/helper/test-online-cpus.c
	index 7257e622015..2716b44ca15 100644
	--- a/t/helper/test-online-cpus.c
	+++ b/t/helper/test-online-cpus.c
	@@ -7,11 +7,11 @@ int cmd__online_cpus(int argc, const char **argv)
	 {
	 	struct strbuf sb = STRBUF_INIT;
	 	const char *const str = "Hello, World";
	-	const size_t len = strlen(str);
	 
	 	for (size_t i = 0; i < 10000000; i++) {
	 		strbuf_reset(&sb);
	-		strbuf_addf(&sb, "%*s/", (int)len, str);
	+		strbuf_addstr(&sb, str);
	+		strbuf_addch(&sb, '/');
	 		puts(sb.buf);
	 	}
	 	return 0;
	b17fb99bf7e (HEAD -> master) add
	diff --git a/t/helper/test-online-cpus.c b/t/helper/test-online-cpus.c
	index 2716b44ca15..5e52b622c4d 100644
	--- a/t/helper/test-online-cpus.c
	+++ b/t/helper/test-online-cpus.c
	@@ -7,10 +7,11 @@ int cmd__online_cpus(int argc, const char **argv)
	 {
	 	struct strbuf sb = STRBUF_INIT;
	 	const char *const str = "Hello, World";
	+	const size_t len = strlen(str);
	 
	 	for (size_t i = 0; i < 10000000; i++) {
	 		strbuf_reset(&sb);
	-		strbuf_addstr(&sb, str);
	+		strbuf_add(&sb, str, len);
	 		strbuf_addch(&sb, '/');
	 		puts(sb.buf);
	 	}
Junio C Hamano Feb. 6, 2023, 6:55 p.m. UTC | #4
Derrick Stolee <derrickstolee@github.com> writes:

>>> -	int base_path_len = tree_path->len;
>>> +	size_t base_path_len = tree_path->len;
>>> ...
>>>  				strbuf_setlen(tree_path, base_path_len);
>>> -				strbuf_grow(tree_path, base_path_len + entry.pathlen + 1);
>>> +				strbuf_grow(tree_path, entry.pathlen + 1);
>>>  				strbuf_add(tree_path, entry.path, entry.pathlen);
>>> ...
>> 
>> The size_t conversion is trivially correct.
>
> I agree, and thanks for finding and fixing this issue.
> ...
>> One wonders if (even for this index-related code) we really need such
>> careful management of growth, and could instead do with:
>> 
>> 	strbuf_setlen(tree_path, base_path_len);
>> 	strbuf_add(tree_path, entry.path, entry.pathlen);
>> 	strbuf_addch(tree_path, '/');
>
> This would be my preferred way to go here.

Yup.  _setlen() is still very useful to truncate an existing
contents in a strbuf, but we should look at each use of _grow() with
suspicion that it might be an mistaken attempt to "optimize" where
there is not a room for optimization.

There may be cases where tight and exact allocation is desirable but
this is not such a codepath.  tree_path will not be overly long that
we want to avoid extra allocation for giving us slack to prepare for
reallocation.  And tree_path is passed to recusive calls to further
be grown, which is exactly why we would want to use ALLOW_GROW() kind
of allocation with slack to amortize the allocation cost.

Thanks.
René Scharfe Feb. 10, 2023, 8:20 p.m. UTC | #5
Am 05.02.23 um 22:12 schrieb Ævar Arnfjörð Bjarmason:
>
> On Sat, Feb 04 2023, René Scharfe wrote:
>
>> Use size_t to store the original length of the strbuf tree_len, as
>> that's the correct type.
>>
>> Don't double the allocated size of the strbuf when adding a subdirectory
>> name.  Only extend it to fit that name and a slash.
>>
>> Signed-off-by: René Scharfe <l.s.r@web.de>
>> ---
>>  cache-tree.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/cache-tree.c b/cache-tree.c
>> index 9af457f47c..35f7617164 100644
>> --- a/cache-tree.c
>> +++ b/cache-tree.c
>> @@ -760,7 +760,7 @@ static void prime_cache_tree_rec(struct repository *r,
>>  	struct tree_desc desc;
>>  	struct name_entry entry;
>>  	int cnt;
>> -	int base_path_len = tree_path->len;
>> +	size_t base_path_len = tree_path->len;
>>
>>  	oidcpy(&it->oid, &tree->object.oid);
>>
>> @@ -785,7 +785,7 @@ static void prime_cache_tree_rec(struct repository *r,
>>  			 */
>>  			if (r->index->sparse_index) {
>>  				strbuf_setlen(tree_path, base_path_len);
>> -				strbuf_grow(tree_path, base_path_len + entry.pathlen + 1);
>> +				strbuf_grow(tree_path, entry.pathlen + 1);
>>  				strbuf_add(tree_path, entry.path, entry.pathlen);
>>  				strbuf_addch(tree_path, '/');
>>  			}
>
> The size_t conversion is trivially correct.
>
> But what do you mean with "don't double the[...]"? Do you mean that this
> manages to evade growing these to 24 etc?

strbuf_setlen() truncates the string to the directory name.  strbuf_grow() then
makes enough room to add that directory name again (that's what I mean with
"double") plus the entry path.  We don't add the directory name a second time,
so we don't need to make room for it.

René
René Scharfe Feb. 10, 2023, 8:20 p.m. UTC | #6
Am 06.02.23 um 19:55 schrieb Junio C Hamano:
> Derrick Stolee <derrickstolee@github.com> writes:
>
>>>> -	int base_path_len = tree_path->len;
>>>> +	size_t base_path_len = tree_path->len;
>>>> ...
>>>>  				strbuf_setlen(tree_path, base_path_len);
>>>> -				strbuf_grow(tree_path, base_path_len + entry.pathlen + 1);
>>>> +				strbuf_grow(tree_path, entry.pathlen + 1);
>>>>  				strbuf_add(tree_path, entry.path, entry.pathlen);
>>>> ...
>>>
>>> The size_t conversion is trivially correct.
>>
>> I agree, and thanks for finding and fixing this issue.
>> ...
>>> One wonders if (even for this index-related code) we really need such
>>> careful management of growth, and could instead do with:
>>>
>>> 	strbuf_setlen(tree_path, base_path_len);
>>> 	strbuf_add(tree_path, entry.path, entry.pathlen);
>>> 	strbuf_addch(tree_path, '/');
>>
>> This would be my preferred way to go here.
>
> Yup.  _setlen() is still very useful to truncate an existing
> contents in a strbuf, but we should look at each use of _grow() with
> suspicion that it might be an mistaken attempt to "optimize" where
> there is not a room for optimization.
>
> There may be cases where tight and exact allocation is desirable but
> this is not such a codepath.  tree_path will not be overly long that
> we want to avoid extra allocation for giving us slack to prepare for
> reallocation.  And tree_path is passed to recusive calls to further
> be grown, which is exactly why we would want to use ALLOW_GROW() kind
> of allocation with slack to amortize the allocation cost.

Right you are.

René
Junio C Hamano Feb. 10, 2023, 8:33 p.m. UTC | #7
René Scharfe <l.s.r@web.de> writes:

>>> -				strbuf_grow(tree_path, base_path_len + entry.pathlen + 1);
>>> +				strbuf_grow(tree_path, entry.pathlen + 1);
>>>  				strbuf_add(tree_path, entry.path, entry.pathlen);
>>>  				strbuf_addch(tree_path, '/');
>>>  			}
>>
>> The size_t conversion is trivially correct.
>>
>> But what do you mean with "don't double the[...]"? Do you mean that this
>> manages to evade growing these to 24 etc?
>
> strbuf_setlen() truncates the string to the directory name.  strbuf_grow() then
> makes enough room to add that directory name again (that's what I mean with
> "double") plus the entry path.  We don't add the directory name a second time,
> so we don't need to make room for it.

Yeah, I think I made the same mistake number of years ago, thinking
that strbuf_grow() was to grow the buffer to the given size, but in
reality it is to grow the buffer by the given size, which felt a bit
unnatural, at least to me.  I do not feel it too strongly but we
might want to rename _grow() to _grow_by() and make _grow() call it
while giving deprecation warning X-<.

There are ~45 calls to strbuf_grow() in C files other than strbuf.c;
I suspect probably a half or more of them can and should be removed
to reduce the resulting code size without hurting anything.
Jeff King Feb. 11, 2023, 2:15 a.m. UTC | #8
On Fri, Feb 10, 2023 at 12:33:00PM -0800, Junio C Hamano wrote:

> > strbuf_setlen() truncates the string to the directory name.  strbuf_grow() then
> > makes enough room to add that directory name again (that's what I mean with
> > "double") plus the entry path.  We don't add the directory name a second time,
> > so we don't need to make room for it.
> 
> Yeah, I think I made the same mistake number of years ago, thinking
> that strbuf_grow() was to grow the buffer to the given size, but in
> reality it is to grow the buffer by the given size, which felt a bit
> unnatural, at least to me.  I do not feel it too strongly but we
> might want to rename _grow() to _grow_by() and make _grow() call it
> while giving deprecation warning X-<.

Having been confused by that myself, I would be happy to see such a
name change.

> There are ~45 calls to strbuf_grow() in C files other than strbuf.c;
> I suspect probably a half or more of them can and should be removed
> to reduce the resulting code size without hurting anything.

My gut feeling is that your suspicion is giving strbuf_grow() users too
much credit. ;) And having looked at the first 7 grep hits, every single
one of them seemed pointless to me.

I wonder if these would make a good #leftoverbits / micro-project
candidate.

-Peff
Junio C Hamano Feb. 11, 2023, 2:46 a.m. UTC | #9
Jeff King <peff@peff.net> writes:

>> ...  I do not feel it too strongly but we
>> might want to rename _grow() to _grow_by() and make _grow() call it
>> while giving deprecation warning X-<.
>
> Having been confused by that myself, I would be happy to see such a
> name change.

If we did not know how useless explicit growth control is, we would
likely have a pair of helpers, _grow_by() and _grow_to(), but given
that ...

>> There are ~45 calls to strbuf_grow() in C files other than strbuf.c;
>> I suspect probably a half or more of them can and should be removed
>> to reduce the resulting code size without hurting anything.
>
> My gut feeling is that your suspicion is giving strbuf_grow() users too
> much credit. ;) And having looked at the first 7 grep hits, every single
> one of them seemed pointless to me.

... we'd only have a very limited number of callers for which the
helper makes sense, I am not sure if it is even worth the renaming.

Or just rename it to _grow_to() while fixing what it does, as
grow_to() is what programmers would expect naturally?

> I wonder if these would make a good #leftoverbits / micro-project
> candidate.

The task is to pick one or two from these 45 hits, analyze what
would happen if we remove the _grow() calls.  For many of them, the
result of such analysis would say the calls are pointless, but for
some, there hopefully are solid reasons why explicit sizing makes
sense.  The former will be just removed, while the latter will be
kept with a new in-code comment to record why it is worth having the
call.  The parameter may need to be updated for them at the same
time.

It can be done extremely poorly without breaking anything in the
test suite, or it can be done expertly.  Unless the result are
reviewed competently, it may make a rather poor micro-project
experience.

So, I dunno.
René Scharfe Feb. 12, 2023, 11:20 a.m. UTC | #10
Am 06.02.23 um 17:18 schrieb Ævar Arnfjörð Bjarmason:
>
> On Mon, Feb 06 2023, Derrick Stolee wrote:
>
>> On 2/5/2023 4:12 PM, Ævar Arnfjörð Bjarmason wrote:
>
>>> Or even just:
>>>
>>> 	strbuf_addf(tree_path, "%*.s/", (int)entry.pathlen, entry.path);
>>
>> Please do not add "addf" functions that can be run in tight loops.
>> It's faster to do strbuf_add() followed by strbuf_addch().
>
> Good point.
>
> I wondered just how much slower, and it's up to 3x! At least according
> to this[1] artificial test case (where I usurped a random test helper).
>
> I wondered if we could just handle some common strbuf_addf() cases
> ourselves, and the benchmark shows (manually annotated, too lazy to set
> up the -n option):
>
> 	git hyperfine -L rev HEAD~5,HEAD~4,HEAD~3,HEAD~2,HEAD~1,HEAD~0 -s 'make CFLAGS=-O3' './t/helper/test-tool online-cpus' -r 3
> 	[...]
> 	Summary
> 	  './t/helper/test-tool online-cpus' in 'HEAD~0' ran <== strbuf_add() + strbuf_addch()
> 	    1.06 ± 0.11 times faster than './t/helper/test-tool online-cpus' in 'HEAD~1' <== strbuf_addstr() + strbuf_addch()
> 	    1.18 ± 0.12 times faster than './t/helper/test-tool online-cpus' in 'HEAD~4' <== hand optimized strbuf_addf() for "%sC"
> 	    1.33 ± 0.18 times faster than './t/helper/test-tool online-cpus' in 'HEAD~2' <== hand optimized strbuf_addf() for "%*sC"
> 	    2.63 ± 0.05 times faster than './t/helper/test-tool online-cpus' in 'HEAD~5' <== strbuf_addf("%s/")
> 	    2.92 ± 0.25 times faster than './t/helper/test-tool online-cpus' in 'HEAD~3' <== strbuf_addf("%*s/")

Woah!

> The "hand optimization" just being a very stupid handling of "%sC" for
> arbitrary values of a single char "C", and ditto for "%*sC" (which
> curiously is slower here).

"%*s" adds padding if needed, your version doesn't.  Perhaps you thought
of "%.*s"?  That might be relevant because for "%*s" vsnprintf(3) needs
to run strlen(3) again on the argument, while for "%.*s" it can stop
when the given length is reached.

> So, for truly hot loops we'd still want to use the add + addch, but if
> anyone's interested (hashtag #leftoverbits) it looks like we could get
> some easy wins (and reduction in code size, as we could stop worrying
> about addf being slow in most cases) with some very dumb minimal
> vaddf(), which could handle these cases (but not anything involving
> padding etc.).
>
> I didn't dig, but wouldn't be surprised if the reason is that C
> libraries need to carry a relatively fat & general sprintf() for all
> those edge cases, locale handling etc, whereas most of our use could
> trivially be represented as some sequence of addstr()/addf() etc.

If that's the reason then resisting the urge to handle ever more cases
in strbuf_addf() would be quite important.

> Another interesting approach (and this is very #leftoverbits) would be
> to perform the same optimization with coccinelle.
>
> I.e. our current use of it is purely "this code X should be written like
> Y, and we should commit Y".
>
> But there's no reason for why we couldn't effectively implement our own
> compiler optimizations for our own APIs with it, so just grab "%s/" etc,
> unpack that in OCaml, then emit strbuf_add() + strbuf_addch(), and that
> would be what the C compiler would see.

Extracting the %s is technically possible using a semantic patch without
scripting:

   @@
   expression sb, str;
   format fmt =~ "^s$";
   @@
   + strbuf_addstr(sb, str);
     strbuf_addf(sb, "%@fmt@..."
   - , str
   + + 2
     );

The "+ 2" is ugly and runs afoul of compiler warning -Wstring-plus-int,
though.  Resolving this probably requires Python scripting as in
https://github.com/coccinelle/coccinelle/blob/master/demos/format.cocci,
or the OCaml magic you have in mind.  I have to admit that I don't even
understand the linked examples, however. :-/

The warning can be avoided by using an array subscription, by the way,
but that's even uglier:

   @@
   expression sb, str;
   format fmt =~ "^s$";
   @@
   + strbuf_addstr(sb, str);
     strbuf_addf(sb,
   + &
     "%@fmt@..."
   - , str
   + [2]
     );

>
> 1.
>
> 	9d23ffb1117 addf + nolen
> 	diff --git a/t/helper/test-online-cpus.c b/t/helper/test-online-cpus.c
> 	index 8cb0d53840f..c802ec579d0 100644
> 	--- a/t/helper/test-online-cpus.c
> 	+++ b/t/helper/test-online-cpus.c
> 	@@ -1,9 +1,17 @@
> 	 #include "test-tool.h"
> 	 #include "git-compat-util.h"
> 	 #include "thread-utils.h"
> *	+#include "strbuf.h"
>
> 	 int cmd__online_cpus(int argc, const char **argv)
> 	 {
> 	-	printf("%d\n", online_cpus());
> 	+	struct strbuf sb = STRBUF_INIT;
> 	+	const char *const str = "Hello, World";
> 	+
> 	+	for (size_t i = 0; i < 10000000; i++) {
> 	+		strbuf_reset(&sb);
> 	+		strbuf_addf(&sb, "%s/", str);
> 	+		puts(sb.buf);
> 	+	}
> 	 	return 0;
> 	 }
> 	9f74eff5623 addf + nolen optimize
> 	diff --git a/strbuf.c b/strbuf.c
> 	index c383f41a3c5..750e5e6a5b4 100644
> 	--- a/strbuf.c
> 	+++ b/strbuf.c
> 	@@ -332,8 +332,16 @@ void strbuf_addchars(struct strbuf *sb, int c, size_t n)
> 	 void strbuf_addf(struct strbuf *sb, const char *fmt, ...)
> 	 {
> 	 	va_list ap;
> 	+
> 	 	va_start(ap, fmt);
> 	-	strbuf_vaddf(sb, fmt, ap);
> 	+	if (*fmt == '%' && *(fmt + 1) == 's' && *(fmt + 2) && !*(fmt + 3)) {
> 	+		const char *arg = va_arg(ap, const char *);
> 	+
> 	+		strbuf_addstr(sb, arg);
> 	+		strbuf_addch(sb, *(fmt + 2));
> 	+	} else {
> 	+		strbuf_vaddf(sb, fmt, ap);
> 	+	}
> 	 	va_end(ap);
> 	 }
>
> 	ca60bb9b479 addf + len
> 	diff --git a/t/helper/test-online-cpus.c b/t/helper/test-online-cpus.c
> 	index c802ec579d0..7257e622015 100644
> 	--- a/t/helper/test-online-cpus.c
> 	+++ b/t/helper/test-online-cpus.c
> 	@@ -7,10 +7,11 @@ int cmd__online_cpus(int argc, const char **argv)
> 	 {
> 	 	struct strbuf sb = STRBUF_INIT;
> 	 	const char *const str = "Hello, World";
> 	+	const size_t len = strlen(str);
>
> 	 	for (size_t i = 0; i < 10000000; i++) {
> 	 		strbuf_reset(&sb);
> 	-		strbuf_addf(&sb, "%s/", str);
> 	+		strbuf_addf(&sb, "%*s/", (int)len, str);
> 	 		puts(sb.buf);
> 	 	}
> 	 	return 0;
> 	1f47987d095 addf + len optimize
> 	diff --git a/strbuf.c b/strbuf.c
> 	index 750e5e6a5b4..88801268f7a 100644
> 	--- a/strbuf.c
> 	+++ b/strbuf.c
> 	@@ -334,11 +334,16 @@ void strbuf_addf(struct strbuf *sb, const char *fmt, ...)
> 	 	va_list ap;
>
> 	 	va_start(ap, fmt);
> 	-	if (*fmt == '%' && *(fmt + 1) == 's' && *(fmt + 2) && !*(fmt + 3)) {
> 	+	if (*fmt == '%' &&
> 	+	    *(fmt + 1) == '*' &&
> 	+	    *(fmt + 2) == 's' &&
> 	+	    *(fmt + 3) &&
> 	+	    !*(fmt + 4)) {
> 	+		int len = va_arg(ap, int);
> 	 		const char *arg = va_arg(ap, const char *);
>
> 	-		strbuf_addstr(sb, arg);
> 	-		strbuf_addch(sb, *(fmt + 2));
> 	+		strbuf_add(sb, arg, len);
> 	+		strbuf_addch(sb, *(fmt + 3));
> 	 	} else {
> 	 		strbuf_vaddf(sb, fmt, ap);
> 	 	}
> 	55c698c0b95 addstr
> 	diff --git a/t/helper/test-online-cpus.c b/t/helper/test-online-cpus.c
> 	index 7257e622015..2716b44ca15 100644
> 	--- a/t/helper/test-online-cpus.c
> 	+++ b/t/helper/test-online-cpus.c
> 	@@ -7,11 +7,11 @@ int cmd__online_cpus(int argc, const char **argv)
> 	 {
> 	 	struct strbuf sb = STRBUF_INIT;
> 	 	const char *const str = "Hello, World";
> 	-	const size_t len = strlen(str);
>
> 	 	for (size_t i = 0; i < 10000000; i++) {
> 	 		strbuf_reset(&sb);
> 	-		strbuf_addf(&sb, "%*s/", (int)len, str);
> 	+		strbuf_addstr(&sb, str);
> 	+		strbuf_addch(&sb, '/');
> 	 		puts(sb.buf);
> 	 	}
> 	 	return 0;
> 	b17fb99bf7e (HEAD -> master) add
> 	diff --git a/t/helper/test-online-cpus.c b/t/helper/test-online-cpus.c
> 	index 2716b44ca15..5e52b622c4d 100644
> 	--- a/t/helper/test-online-cpus.c
> 	+++ b/t/helper/test-online-cpus.c
> 	@@ -7,10 +7,11 @@ int cmd__online_cpus(int argc, const char **argv)
> 	 {
> 	 	struct strbuf sb = STRBUF_INIT;
> 	 	const char *const str = "Hello, World";
> 	+	const size_t len = strlen(str);
>
> 	 	for (size_t i = 0; i < 10000000; i++) {
> 	 		strbuf_reset(&sb);
> 	-		strbuf_addstr(&sb, str);
> 	+		strbuf_add(&sb, str, len);
> 	 		strbuf_addch(&sb, '/');
> 	 		puts(sb.buf);
> 	 	}
diff mbox series

Patch

diff --git a/cache-tree.c b/cache-tree.c
index 9af457f47c..35f7617164 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -760,7 +760,7 @@  static void prime_cache_tree_rec(struct repository *r,
 	struct tree_desc desc;
 	struct name_entry entry;
 	int cnt;
-	int base_path_len = tree_path->len;
+	size_t base_path_len = tree_path->len;

 	oidcpy(&it->oid, &tree->object.oid);

@@ -785,7 +785,7 @@  static void prime_cache_tree_rec(struct repository *r,
 			 */
 			if (r->index->sparse_index) {
 				strbuf_setlen(tree_path, base_path_len);
-				strbuf_grow(tree_path, base_path_len + entry.pathlen + 1);
+				strbuf_grow(tree_path, entry.pathlen + 1);
 				strbuf_add(tree_path, entry.path, entry.pathlen);
 				strbuf_addch(tree_path, '/');
 			}