diff mbox series

[v2] strvec: `strvec_splice()` to a statically initialized vector

Message ID 5bea9f20-eb0d-409d-8f37-f20697d6ce14@gmail.com (mailing list archive)
State Superseded
Headers show
Series [v2] strvec: `strvec_splice()` to a statically initialized vector | expand

Commit Message

Rubén Justo Dec. 3, 2024, 7:47 p.m. UTC
We use a singleton empty array to initialize a `struct strvec`,
similar to the empty string singleton we use to initialize a `struct
strbuf`.

Note that an empty strvec instance (with zero elements) does not
necessarily need to be an instance initialized with the singleton.
Let's refer to strvec instances initialized with the singleton as
"empty-singleton" instances.

    As a side note, this is the current `strvec_pop()`:

    void strvec_pop(struct strvec *array)
    {
    	if (!array->nr)
    		return;
    	free((char *)array->v[array->nr - 1]);
    	array->v[array->nr - 1] = NULL;
    	array->nr--;
    }

    So, with `strvec_pop()` an instance can become empty but it does
    not going to be the an "empty-singleton".

This "empty-singleton" circumstance requires us to be careful when
adding elements to instances.  Specifically, when adding the first
element:  we detach the strvec instance from the singleton and set the
internal pointer in the instance to NULL.  After this point we apply
`realloc()` on the pointer.  We do this in `strvec_push_nodup()`, for
example.

The recently introduced `strvec_splice()` API is expected to be
normally used with non-empty strvec's.  However, it can also end up
being used with "empty-singleton" strvec's:

       struct strvec arr = STRVEC_INIT;
       int a = 0, b = 0;

       ... no modification to arr, a or b ...

       const char *rep[] = { "foo" };
       strvec_splice(&arr, a, b, rep, ARRAY_SIZE(rep));

So, we'll try to add elements to an "empty-singleton" strvec instance.

Avoid misapplying `realloc()` to the singleton in `strvec_splice()` by
adding a special case for "empty-singleton" strvec's.

Signed-off-by: Rubén Justo <rjusto@gmail.com>
---

This iteration adds more detail to the message plus a minor change to
remove some unnecessary parentheses.

Junio: My message in the previous iteration was aimed at readers like
Patrick, who is also the author of `strvec_splice()`.  I certainly
assumed too much prior knowledge, which made the review unnecessarily
laborious.

Rereading what I wrote last night, perhaps the problem now is excess.
I hope not. In any case, here it is :-)

Thanks.

 strvec.c              | 10 ++++++----
 t/unit-tests/strvec.c | 10 ++++++++++
 2 files changed, 16 insertions(+), 4 deletions(-)


Range-diff against v1:
1:  0b60fcc51a ! 1:  c1991e6f3c strvec: `strvec_splice()` to a statically initialized vector
    @@ Metadata
      ## Commit message ##
         strvec: `strvec_splice()` to a statically initialized vector
     
    -    Let's avoid an invalid pointer error in case a client of
    -    `strvec_splice()` ends up with something similar to:
    +    We use a singleton empty array to initialize a `struct strvec`;
    +    similar to the empty string singleton we use to initialize a `struct
    +    strbuf`.
    +
    +    Note that an empty strvec instance (with zero elements) does not
    +    necessarily need to be an instance initialized with the singleton.
    +    Let's refer to strvec instances initialized with the singleton as
    +    "empty-singleton" instances.
    +
    +        As a side note, this is the current `strvec_pop()`:
    +
    +        void strvec_pop(struct strvec *array)
    +        {
    +            if (!array->nr)
    +                    return;
    +            free((char *)array->v[array->nr - 1]);
    +            array->v[array->nr - 1] = NULL;
    +            array->nr--;
    +        }
    +
    +        So, with `strvec_pop()` an instance can become empty but it does
    +        not going to be the an "empty-singleton".
    +
    +    This "empty-singleton" circumstance requires us to be careful when
    +    adding elements to instances.  Specifically, when adding the first
    +    element:  when we detach the strvec instance from the singleton and
    +    set the internal pointer in the instance to NULL.  After this point we
    +    apply `realloc()` on the pointer.  We do this in
    +    `strvec_push_nodup()`, for example.
    +
    +    The recently introduced `strvec_splice()` API is expected to be
    +    normally used with non-empty strvec's.  However, it can also end up
    +    being used with "empty-singleton" strvec's:
     
                struct strvec arr = STRVEC_INIT;
    +           int a = 0, b = 0;
    +
    +           ... no modification to arr, a or b ...
    +
                const char *rep[] = { "foo" };
    +           strvec_splice(&arr, a, b, rep, ARRAY_SIZE(rep));
    +
    +    So, we'll try to add elements to an "empty-singleton" strvec instance.
     
    -           strvec_splice(&arr, 0, 0, rep, ARRAY_SIZE(rep));
    +    Avoid misapplying `realloc()` to the singleton in `strvec_splice()` by
    +    adding a special case for strvec's initialized with the singleton.
     
         Signed-off-by: Rubén Justo <rjusto@gmail.com>
     
    @@ strvec.c: void strvec_splice(struct strvec *array, size_t idx, size_t len,
      			(array->nr - idx - len + 1) * sizeof(char *));
     -		array->nr += (replacement_len - len);
     -	}
    -+	array->nr += (replacement_len - len);
    ++	array->nr += replacement_len - len;
      	for (size_t i = 0; i < replacement_len; i++)
      		array->v[idx + i] = xstrdup(replacement[i]);
      }

Comments

Junio C Hamano Dec. 4, 2024, 12:09 a.m. UTC | #1
Rubén Justo <rjusto@gmail.com> writes:

> Note that an empty strvec instance (with zero elements) does not
> necessarily need to be an instance initialized with the singleton.

Correct.

When (vec.nr == 0), vec.v may be pointing at 

 (1) an allocated piece of memory, if the strvec was previously used
     to hold some strings; or
 (2) singleton array with NULL.

and vec.v[0] is NULL.  This is to allow you to pass vec.v[] as a
NULL terminated list of (char *) (aka argv[][]) to functions.

That can be said a bit differently and more concisely like so:

    A strvec instance with no elements can have its member .v
    pointing at empty_strvec[] or pointing at an allocated piece of
    memory, and either way .v[0] has NULL in it, to allow you to
    always treat vec.v[] as a NULL terminated array of strings, even
    immediately after initialization.

and then you can lose the strvec_pop() illustration below that talks
about an allocated piece of memory that was previously used.

> The recently introduced `strvec_splice()` API is expected to be
> normally used with non-empty strvec's.

It is perfectly sensible to expect that you can splice your stuff
into an empty strvec, so all this sentence is saying is that a
strvec is more often non-empty than empty. I'd recommend dropping
this sentence.

Something like

    When growing a strvec, we'd use a realloc() call on its .v[]
    member, but a care must be taken when it is pointing at
    empty_strvec[] and is not pointing at an allocated piece of
    memory.  strvec_push_nodup() and strvec_push() correctly do so.
    The recently added strvec_splice() forgot to.

should be sufficient.  Notice that I didn't have to invent a new
term "empty-singleton" at all ;-).

Thanks.
Rubén Justo Dec. 4, 2024, 1:08 a.m. UTC | #2
On Wed, Dec 04, 2024 at 09:09:36AM +0900, Junio C Hamano wrote:

> > The recently introduced `strvec_splice()` API is expected to be
> > normally used with non-empty strvec's.
> 
> It is perfectly sensible to expect that you can splice your stuff
> into an empty strvec, so all this sentence is saying is that a
> strvec is more often non-empty than empty.

I also wanted to introduce a reason why we might have overlooked
making `strvec_splice()` aware of the singleton object, without using
a verb like "forget".

> Notice that I didn't have to invent a new
> term "empty-singleton" at all ;-).

:-D

In my defense, I wrote the message late last night and was already
tired.  And when I read it today, it didn't seem so bad to me, in the
context of the message.

Thanks.
Junio C Hamano Dec. 4, 2024, 7:41 a.m. UTC | #3
This is queued as rj/strvec-splice-fix, and t/unit-tests/bin/unit-tests
dies of leaks under leak-check.



$ t/unit-tests/bin/unit-tests
TAP version 13
# start of suite 1: ctype
ok 1 - ctype::isspace
ok 2 - ctype::isdigit
ok 3 - ctype::isalpha
ok 4 - ctype::isalnum
ok 5 - ctype::is_glob_special
ok 6 - ctype::is_regex_special
ok 7 - ctype::is_pathspec_magic
ok 8 - ctype::isascii
ok 9 - ctype::islower
ok 10 - ctype::isupper
ok 11 - ctype::iscntrl
ok 12 - ctype::ispunct
ok 13 - ctype::isxdigit
ok 14 - ctype::isprint
# start of suite 2: strvec
ok 15 - strvec::init
ok 16 - strvec::dynamic_init
ok 17 - strvec::clear
ok 18 - strvec::push
ok 19 - strvec::pushf
ok 20 - strvec::pushl
ok 21 - strvec::pushv
not ok 22 - strvec::splice_just_initialized_strvec
    ---
    reason: |
      String mismatch: (&vec)->v[i] != expect[i]
      'bar' != '(null)'
    at:
      file: 't/unit-tests/strvec.c'
      line: 97
      function: 'test_strvec__splice_just_initialized_strvec'
    ---
ok 23 - strvec::splice_with_same_size_replacement
ok 24 - strvec::splice_with_smaller_replacement
ok 25 - strvec::splice_with_bigger_replacement
ok 26 - strvec::splice_with_empty_replacement
ok 27 - strvec::splice_with_empty_original
ok 28 - strvec::splice_at_tail
ok 29 - strvec::replace_at_head
ok 30 - strvec::replace_at_tail
ok 31 - strvec::replace_in_between
ok 32 - strvec::replace_with_substring
ok 33 - strvec::remove_at_head
ok 34 - strvec::remove_at_tail
ok 35 - strvec::remove_in_between
ok 36 - strvec::pop_empty_array
ok 37 - strvec::pop_non_empty_array
ok 38 - strvec::split_empty_string
ok 39 - strvec::split_single_item
ok 40 - strvec::split_multiple_items
ok 41 - strvec::split_whitespace_only
ok 42 - strvec::split_multiple_consecutive_whitespaces
ok 43 - strvec::detach

=================================================================
==5178==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 192 byte(s) in 1 object(s) allocated from:
    #0 0x5600496ec825 in __interceptor_realloc (/usr/local/google/home/jch/w/git.git/t/unit-tests/bin/unit-tests+0x67825) (BuildId: 6efbef9c6f87bfa879e770b463031b396d4d5efe)
    #1 0x56004973b4cd in xrealloc /usr/local/google/home/jch/w/git.git/wrapper.c:140:8
    #2 0x560049714c6f in strvec_splice /usr/local/google/home/jch/w/git.git/strvec.c:67:3
    #3 0x5600496f0c1d in test_strvec__splice_just_initialized_strvec /usr/local/google/home/jch/w/git.git/t/unit-tests/strvec.c:96:2
    #4 0x5600496f627b in clar_run_test /usr/local/google/home/jch/w/git.git/t/unit-tests/clar/clar.c:315:3
    #5 0x5600496f46fa in clar_run_suite /usr/local/google/home/jch/w/git.git/t/unit-tests/clar/clar.c:412:3
    #6 0x5600496f43e1 in clar_test_run /usr/local/google/home/jch/w/git.git/t/unit-tests/clar/clar.c:608:4
    #7 0x5600496f4bdf in clar_test /usr/local/google/home/jch/w/git.git/t/unit-tests/clar/clar.c:651:11
    #8 0x5600496f787c in cmd_main /usr/local/google/home/jch/w/git.git/t/unit-tests/unit-test.c:42:8
    #9 0x5600496f793a in main /usr/local/google/home/jch/w/git.git/common-main.c:9:11
    #10 0x7f59ea91dc89 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16

Direct leak of 48 byte(s) in 1 object(s) allocated from:
    #0 0x5600496ec640 in __interceptor_calloc (/usr/local/google/home/jch/w/git.git/t/unit-tests/bin/unit-tests+0x67640) (BuildId: 6efbef9c6f87bfa879e770b463031b396d4d5efe)
    #1 0x5600496f4cee in clar__fail /usr/local/google/home/jch/w/git.git/t/unit-tests/clar/clar.c:687:15
    #2 0x5600496f5f25 in clar__assert_equal /usr/local/google/home/jch/w/git.git/t/unit-tests/clar/clar.c:844:3
    #3 0x5600496f0db6 in test_strvec__splice_just_initialized_strvec /usr/local/google/home/jch/w/git.git/t/unit-tests/strvec.c:97:2
    #4 0x5600496f627b in clar_run_test /usr/local/google/home/jch/w/git.git/t/unit-tests/clar/clar.c:315:3
    #5 0x5600496f46fa in clar_run_suite /usr/local/google/home/jch/w/git.git/t/unit-tests/clar/clar.c:412:3
    #6 0x5600496f43e1 in clar_test_run /usr/local/google/home/jch/w/git.git/t/unit-tests/clar/clar.c:608:4
    #7 0x5600496f4bdf in clar_test /usr/local/google/home/jch/w/git.git/t/unit-tests/clar/clar.c:651:11
    #8 0x5600496f787c in cmd_main /usr/local/google/home/jch/w/git.git/t/unit-tests/unit-test.c:42:8
    #9 0x5600496f793a in main /usr/local/google/home/jch/w/git.git/common-main.c:9:11
    #10 0x7f59ea91dc89 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16

Indirect leak of 18 byte(s) in 1 object(s) allocated from:
    #0 0x5600496ec3c6 in __interceptor_malloc (/usr/local/google/home/jch/w/git.git/t/unit-tests/bin/unit-tests+0x673c6) (BuildId: 6efbef9c6f87bfa879e770b463031b396d4d5efe)
    #1 0x7f59ea9964f9 in strdup string/strdup.c:42:15
    #2 0x296c6c756e28271f  (<unknown module>)

Indirect leak of 4 byte(s) in 1 object(s) allocated from:
    #0 0x5600496ec3c6 in __interceptor_malloc (/usr/local/google/home/jch/w/git.git/t/unit-tests/bin/unit-tests+0x673c6) (BuildId: 6efbef9c6f87bfa879e770b463031b396d4d5efe)
    #1 0x7f59ea9964f9 in strdup string/strdup.c:42:15

SUMMARY: LeakSanitizer: 262 byte(s) leaked in 4 allocation(s).
Rubén Justo Dec. 4, 2024, 8:46 a.m. UTC | #4
On Wed, Dec 04, 2024 at 04:41:36PM +0900, Junio C Hamano wrote:
> This is queued as rj/strvec-splice-fix, and t/unit-tests/bin/unit-tests
> dies of leaks under leak-check.

Right! We need this:

diff --git a/strvec.c b/strvec.c
index 087c020f5b..b1e6c5d8cd 100644
--- a/strvec.c
+++ b/strvec.c
@@ -66,6 +66,7 @@ void strvec_splice(struct strvec *array, size_t idx, size_t len,
                        array->v = NULL;
                ALLOC_GROW(array->v, array->nr + (replacement_len - len) + 1,
                           array->alloc);
+               array->v[array->nr + 1] = NULL;
        }
        for (size_t i = 0; i < len; i++)
                free((char *)array->v[idx + i]);

Sorry.  I'll re-roll later today.

> 
> 
> 
> $ t/unit-tests/bin/unit-tests
> TAP version 13
> # start of suite 1: ctype
> ok 1 - ctype::isspace
> ok 2 - ctype::isdigit
> ok 3 - ctype::isalpha
> ok 4 - ctype::isalnum
> ok 5 - ctype::is_glob_special
> ok 6 - ctype::is_regex_special
> ok 7 - ctype::is_pathspec_magic
> ok 8 - ctype::isascii
> ok 9 - ctype::islower
> ok 10 - ctype::isupper
> ok 11 - ctype::iscntrl
> ok 12 - ctype::ispunct
> ok 13 - ctype::isxdigit
> ok 14 - ctype::isprint
> # start of suite 2: strvec
> ok 15 - strvec::init
> ok 16 - strvec::dynamic_init
> ok 17 - strvec::clear
> ok 18 - strvec::push
> ok 19 - strvec::pushf
> ok 20 - strvec::pushl
> ok 21 - strvec::pushv
> not ok 22 - strvec::splice_just_initialized_strvec
>     ---
>     reason: |
>       String mismatch: (&vec)->v[i] != expect[i]
>       'bar' != '(null)'
>     at:
>       file: 't/unit-tests/strvec.c'
>       line: 97
>       function: 'test_strvec__splice_just_initialized_strvec'
>     ---
> ok 23 - strvec::splice_with_same_size_replacement
> ok 24 - strvec::splice_with_smaller_replacement
> ok 25 - strvec::splice_with_bigger_replacement
> ok 26 - strvec::splice_with_empty_replacement
> ok 27 - strvec::splice_with_empty_original
> ok 28 - strvec::splice_at_tail
> ok 29 - strvec::replace_at_head
> ok 30 - strvec::replace_at_tail
> ok 31 - strvec::replace_in_between
> ok 32 - strvec::replace_with_substring
> ok 33 - strvec::remove_at_head
> ok 34 - strvec::remove_at_tail
> ok 35 - strvec::remove_in_between
> ok 36 - strvec::pop_empty_array
> ok 37 - strvec::pop_non_empty_array
> ok 38 - strvec::split_empty_string
> ok 39 - strvec::split_single_item
> ok 40 - strvec::split_multiple_items
> ok 41 - strvec::split_whitespace_only
> ok 42 - strvec::split_multiple_consecutive_whitespaces
> ok 43 - strvec::detach
> 
> =================================================================
> ==5178==ERROR: LeakSanitizer: detected memory leaks
> 
> Direct leak of 192 byte(s) in 1 object(s) allocated from:
>     #0 0x5600496ec825 in __interceptor_realloc (/usr/local/google/home/jch/w/git.git/t/unit-tests/bin/unit-tests+0x67825) (BuildId: 6efbef9c6f87bfa879e770b463031b396d4d5efe)
>     #1 0x56004973b4cd in xrealloc /usr/local/google/home/jch/w/git.git/wrapper.c:140:8
>     #2 0x560049714c6f in strvec_splice /usr/local/google/home/jch/w/git.git/strvec.c:67:3
>     #3 0x5600496f0c1d in test_strvec__splice_just_initialized_strvec /usr/local/google/home/jch/w/git.git/t/unit-tests/strvec.c:96:2
>     #4 0x5600496f627b in clar_run_test /usr/local/google/home/jch/w/git.git/t/unit-tests/clar/clar.c:315:3
>     #5 0x5600496f46fa in clar_run_suite /usr/local/google/home/jch/w/git.git/t/unit-tests/clar/clar.c:412:3
>     #6 0x5600496f43e1 in clar_test_run /usr/local/google/home/jch/w/git.git/t/unit-tests/clar/clar.c:608:4
>     #7 0x5600496f4bdf in clar_test /usr/local/google/home/jch/w/git.git/t/unit-tests/clar/clar.c:651:11
>     #8 0x5600496f787c in cmd_main /usr/local/google/home/jch/w/git.git/t/unit-tests/unit-test.c:42:8
>     #9 0x5600496f793a in main /usr/local/google/home/jch/w/git.git/common-main.c:9:11
>     #10 0x7f59ea91dc89 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
> 
> Direct leak of 48 byte(s) in 1 object(s) allocated from:
>     #0 0x5600496ec640 in __interceptor_calloc (/usr/local/google/home/jch/w/git.git/t/unit-tests/bin/unit-tests+0x67640) (BuildId: 6efbef9c6f87bfa879e770b463031b396d4d5efe)
>     #1 0x5600496f4cee in clar__fail /usr/local/google/home/jch/w/git.git/t/unit-tests/clar/clar.c:687:15
>     #2 0x5600496f5f25 in clar__assert_equal /usr/local/google/home/jch/w/git.git/t/unit-tests/clar/clar.c:844:3
>     #3 0x5600496f0db6 in test_strvec__splice_just_initialized_strvec /usr/local/google/home/jch/w/git.git/t/unit-tests/strvec.c:97:2
>     #4 0x5600496f627b in clar_run_test /usr/local/google/home/jch/w/git.git/t/unit-tests/clar/clar.c:315:3
>     #5 0x5600496f46fa in clar_run_suite /usr/local/google/home/jch/w/git.git/t/unit-tests/clar/clar.c:412:3
>     #6 0x5600496f43e1 in clar_test_run /usr/local/google/home/jch/w/git.git/t/unit-tests/clar/clar.c:608:4
>     #7 0x5600496f4bdf in clar_test /usr/local/google/home/jch/w/git.git/t/unit-tests/clar/clar.c:651:11
>     #8 0x5600496f787c in cmd_main /usr/local/google/home/jch/w/git.git/t/unit-tests/unit-test.c:42:8
>     #9 0x5600496f793a in main /usr/local/google/home/jch/w/git.git/common-main.c:9:11
>     #10 0x7f59ea91dc89 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
> 
> Indirect leak of 18 byte(s) in 1 object(s) allocated from:
>     #0 0x5600496ec3c6 in __interceptor_malloc (/usr/local/google/home/jch/w/git.git/t/unit-tests/bin/unit-tests+0x673c6) (BuildId: 6efbef9c6f87bfa879e770b463031b396d4d5efe)
>     #1 0x7f59ea9964f9 in strdup string/strdup.c:42:15
>     #2 0x296c6c756e28271f  (<unknown module>)
> 
> Indirect leak of 4 byte(s) in 1 object(s) allocated from:
>     #0 0x5600496ec3c6 in __interceptor_malloc (/usr/local/google/home/jch/w/git.git/t/unit-tests/bin/unit-tests+0x673c6) (BuildId: 6efbef9c6f87bfa879e770b463031b396d4d5efe)
>     #1 0x7f59ea9964f9 in strdup string/strdup.c:42:15
> 
> SUMMARY: LeakSanitizer: 262 byte(s) leaked in 4 allocation(s).
Rubén Justo Dec. 4, 2024, 8:50 a.m. UTC | #5
On 12/4/24 9:46 AM, Rubén Justo wrote:
> On Wed, Dec 04, 2024 at 04:41:36PM +0900, Junio C Hamano wrote:
>> This is queued as rj/strvec-splice-fix, and t/unit-tests/bin/unit-tests
>> dies of leaks under leak-check.
> 
> Right! We need this:
> 
> diff --git a/strvec.c b/strvec.c
> index 087c020f5b..b1e6c5d8cd 100644
> --- a/strvec.c
> +++ b/strvec.c
> @@ -66,6 +66,7 @@ void strvec_splice(struct strvec *array, size_t idx, size_t len,
>                         array->v = NULL;
>                 ALLOC_GROW(array->v, array->nr + (replacement_len - len) + 1,
>                            array->alloc);
> +               array->v[array->nr + 1] = NULL;

I mean:

+               array->v[array->nr + (replacement_len - len) + 1] = NULL;


>         }
>         for (size_t i = 0; i < len; i++)
>                 free((char *)array->v[idx + i]);
> 
> Sorry.  I'll re-roll later today.
> 
>>
>>
>>
>> $ t/unit-tests/bin/unit-tests
>> TAP version 13
>> # start of suite 1: ctype
>> ok 1 - ctype::isspace
>> ok 2 - ctype::isdigit
>> ok 3 - ctype::isalpha
>> ok 4 - ctype::isalnum
>> ok 5 - ctype::is_glob_special
>> ok 6 - ctype::is_regex_special
>> ok 7 - ctype::is_pathspec_magic
>> ok 8 - ctype::isascii
>> ok 9 - ctype::islower
>> ok 10 - ctype::isupper
>> ok 11 - ctype::iscntrl
>> ok 12 - ctype::ispunct
>> ok 13 - ctype::isxdigit
>> ok 14 - ctype::isprint
>> # start of suite 2: strvec
>> ok 15 - strvec::init
>> ok 16 - strvec::dynamic_init
>> ok 17 - strvec::clear
>> ok 18 - strvec::push
>> ok 19 - strvec::pushf
>> ok 20 - strvec::pushl
>> ok 21 - strvec::pushv
>> not ok 22 - strvec::splice_just_initialized_strvec
>>     ---
>>     reason: |
>>       String mismatch: (&vec)->v[i] != expect[i]
>>       'bar' != '(null)'
>>     at:
>>       file: 't/unit-tests/strvec.c'
>>       line: 97
>>       function: 'test_strvec__splice_just_initialized_strvec'
>>     ---
>> ok 23 - strvec::splice_with_same_size_replacement
>> ok 24 - strvec::splice_with_smaller_replacement
>> ok 25 - strvec::splice_with_bigger_replacement
>> ok 26 - strvec::splice_with_empty_replacement
>> ok 27 - strvec::splice_with_empty_original
>> ok 28 - strvec::splice_at_tail
>> ok 29 - strvec::replace_at_head
>> ok 30 - strvec::replace_at_tail
>> ok 31 - strvec::replace_in_between
>> ok 32 - strvec::replace_with_substring
>> ok 33 - strvec::remove_at_head
>> ok 34 - strvec::remove_at_tail
>> ok 35 - strvec::remove_in_between
>> ok 36 - strvec::pop_empty_array
>> ok 37 - strvec::pop_non_empty_array
>> ok 38 - strvec::split_empty_string
>> ok 39 - strvec::split_single_item
>> ok 40 - strvec::split_multiple_items
>> ok 41 - strvec::split_whitespace_only
>> ok 42 - strvec::split_multiple_consecutive_whitespaces
>> ok 43 - strvec::detach
>>
>> =================================================================
>> ==5178==ERROR: LeakSanitizer: detected memory leaks
>>
>> Direct leak of 192 byte(s) in 1 object(s) allocated from:
>>     #0 0x5600496ec825 in __interceptor_realloc (/usr/local/google/home/jch/w/git.git/t/unit-tests/bin/unit-tests+0x67825) (BuildId: 6efbef9c6f87bfa879e770b463031b396d4d5efe)
>>     #1 0x56004973b4cd in xrealloc /usr/local/google/home/jch/w/git.git/wrapper.c:140:8
>>     #2 0x560049714c6f in strvec_splice /usr/local/google/home/jch/w/git.git/strvec.c:67:3
>>     #3 0x5600496f0c1d in test_strvec__splice_just_initialized_strvec /usr/local/google/home/jch/w/git.git/t/unit-tests/strvec.c:96:2
>>     #4 0x5600496f627b in clar_run_test /usr/local/google/home/jch/w/git.git/t/unit-tests/clar/clar.c:315:3
>>     #5 0x5600496f46fa in clar_run_suite /usr/local/google/home/jch/w/git.git/t/unit-tests/clar/clar.c:412:3
>>     #6 0x5600496f43e1 in clar_test_run /usr/local/google/home/jch/w/git.git/t/unit-tests/clar/clar.c:608:4
>>     #7 0x5600496f4bdf in clar_test /usr/local/google/home/jch/w/git.git/t/unit-tests/clar/clar.c:651:11
>>     #8 0x5600496f787c in cmd_main /usr/local/google/home/jch/w/git.git/t/unit-tests/unit-test.c:42:8
>>     #9 0x5600496f793a in main /usr/local/google/home/jch/w/git.git/common-main.c:9:11
>>     #10 0x7f59ea91dc89 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
>>
>> Direct leak of 48 byte(s) in 1 object(s) allocated from:
>>     #0 0x5600496ec640 in __interceptor_calloc (/usr/local/google/home/jch/w/git.git/t/unit-tests/bin/unit-tests+0x67640) (BuildId: 6efbef9c6f87bfa879e770b463031b396d4d5efe)
>>     #1 0x5600496f4cee in clar__fail /usr/local/google/home/jch/w/git.git/t/unit-tests/clar/clar.c:687:15
>>     #2 0x5600496f5f25 in clar__assert_equal /usr/local/google/home/jch/w/git.git/t/unit-tests/clar/clar.c:844:3
>>     #3 0x5600496f0db6 in test_strvec__splice_just_initialized_strvec /usr/local/google/home/jch/w/git.git/t/unit-tests/strvec.c:97:2
>>     #4 0x5600496f627b in clar_run_test /usr/local/google/home/jch/w/git.git/t/unit-tests/clar/clar.c:315:3
>>     #5 0x5600496f46fa in clar_run_suite /usr/local/google/home/jch/w/git.git/t/unit-tests/clar/clar.c:412:3
>>     #6 0x5600496f43e1 in clar_test_run /usr/local/google/home/jch/w/git.git/t/unit-tests/clar/clar.c:608:4
>>     #7 0x5600496f4bdf in clar_test /usr/local/google/home/jch/w/git.git/t/unit-tests/clar/clar.c:651:11
>>     #8 0x5600496f787c in cmd_main /usr/local/google/home/jch/w/git.git/t/unit-tests/unit-test.c:42:8
>>     #9 0x5600496f793a in main /usr/local/google/home/jch/w/git.git/common-main.c:9:11
>>     #10 0x7f59ea91dc89 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
>>
>> Indirect leak of 18 byte(s) in 1 object(s) allocated from:
>>     #0 0x5600496ec3c6 in __interceptor_malloc (/usr/local/google/home/jch/w/git.git/t/unit-tests/bin/unit-tests+0x673c6) (BuildId: 6efbef9c6f87bfa879e770b463031b396d4d5efe)
>>     #1 0x7f59ea9964f9 in strdup string/strdup.c:42:15
>>     #2 0x296c6c756e28271f  (<unknown module>)
>>
>> Indirect leak of 4 byte(s) in 1 object(s) allocated from:
>>     #0 0x5600496ec3c6 in __interceptor_malloc (/usr/local/google/home/jch/w/git.git/t/unit-tests/bin/unit-tests+0x673c6) (BuildId: 6efbef9c6f87bfa879e770b463031b396d4d5efe)
>>     #1 0x7f59ea9964f9 in strdup string/strdup.c:42:15
>>
>> SUMMARY: LeakSanitizer: 262 byte(s) leaked in 4 allocation(s).
Junio C Hamano Dec. 4, 2024, 10:15 a.m. UTC | #6
Rubén Justo <rjusto@gmail.com> writes:

>> @@ -66,6 +66,7 @@ void strvec_splice(struct strvec *array, size_t idx, size_t len,
>>                         array->v = NULL;
>>                 ALLOC_GROW(array->v, array->nr + (replacement_len - len) + 1,
>>                            array->alloc);
>> +               array->v[array->nr + 1] = NULL;
>
> I mean:
>
> +               array->v[array->nr + (replacement_len - len) + 1] = NULL;
>
>
>>         }
>>         for (size_t i = 0; i < len; i++)
>>                 free((char *)array->v[idx + i]);
>> 
>> Sorry.  I'll re-roll later today.

No need to say "sorry".  Thanks for quickly reacting and starting to
work on it.
karthik nayak Dec. 4, 2024, 11:26 a.m. UTC | #7
Rubén Justo <rjusto@gmail.com> writes:

Nit: Is the commit subject missing a verb?

> We use a singleton empty array to initialize a `struct strvec`,
> similar to the empty string singleton we use to initialize a `struct
> strbuf`.
>

So a singleton empty array is a statically allocated array element, so
for strvec, this would be `const char *empty_strvec[] = { NULL }`.

> Note that an empty strvec instance (with zero elements) does not
> necessarily need to be an instance initialized with the singleton.
> Let's refer to strvec instances initialized with the singleton as
> "empty-singleton" instances.
>

Right, so when we add elements ideally, we ideally check whether it is a
singleton or not. This is evident in `strvec_push_nodup()`:

    void strvec_push_nodup(struct strvec *array, char *value)
    {
    	if (array->v == empty_strvec)
    		array->v = NULL;

    	ALLOC_GROW(array->v, array->nr + 2, array->alloc);
    	array->v[array->nr++] = value;
    	array->v[array->nr] = NULL;
    }

>     As a side note, this is the current `strvec_pop()`:
>
>     void strvec_pop(struct strvec *array)
>     {
>     	if (!array->nr)
>     		return;
>     	free((char *)array->v[array->nr - 1]);
>     	array->v[array->nr - 1] = NULL;
>     	array->nr--;
>     }
>
>     So, with `strvec_pop()` an instance can become empty but it does
>     not going to be the an "empty-singleton".

Correct, since we simply set the array element to NULL, but this is
still a dynamically allocated array.

Nit: The sentence reads a bit weirdly.

> This "empty-singleton" circumstance requires us to be careful when
> adding elements to instances.  Specifically, when adding the first
> element:  we detach the strvec instance from the singleton and set the
> internal pointer in the instance to NULL.  After this point we apply
> `realloc()` on the pointer.  We do this in `strvec_push_nodup()`, for
> example.
>
> The recently introduced `strvec_splice()` API is expected to be
> normally used with non-empty strvec's.  However, it can also end up
> being used with "empty-singleton" strvec's:
>
>        struct strvec arr = STRVEC_INIT;
>        int a = 0, b = 0;
>
>        ... no modification to arr, a or b ...
>
>        const char *rep[] = { "foo" };
>        strvec_splice(&arr, a, b, rep, ARRAY_SIZE(rep));
>
> So, we'll try to add elements to an "empty-singleton" strvec instance.
>
> Avoid misapplying `realloc()` to the singleton in `strvec_splice()` by
> adding a special case for "empty-singleton" strvec's.
>

So everything said here makes sense, that's a great explanation.

> Signed-off-by: Rubén Justo <rjusto@gmail.com>
> ---
>
> This iteration adds more detail to the message plus a minor change to
> remove some unnecessary parentheses.
>
> Junio: My message in the previous iteration was aimed at readers like
> Patrick, who is also the author of `strvec_splice()`.  I certainly
> assumed too much prior knowledge, which made the review unnecessarily
> laborious.
>
> Rereading what I wrote last night, perhaps the problem now is excess.
> I hope not. In any case, here it is :-)
>

I would say this is very useful over the first iteration, considering I
am someone without prior knowledge here.

> Thanks.
>
>  strvec.c              | 10 ++++++----
>  t/unit-tests/strvec.c | 10 ++++++++++
>  2 files changed, 16 insertions(+), 4 deletions(-)
>
> diff --git a/strvec.c b/strvec.c
> index d1cf4e2496..087c020f5b 100644
> --- a/strvec.c
> +++ b/strvec.c
> @@ -61,16 +61,18 @@ void strvec_splice(struct strvec *array, size_t idx, size_t len,
>  {
>  	if (idx + len > array->nr)
>  		BUG("range outside of array boundary");
> -	if (replacement_len > len)
> +	if (replacement_len > len) {
> +		if (array->v == empty_strvec)
> +			array->v = NULL;
>  		ALLOC_GROW(array->v, array->nr + (replacement_len - len) + 1,
>  			   array->alloc);
> +	}
>  	for (size_t i = 0; i < len; i++)
>  		free((char *)array->v[idx + i]);
> -	if (replacement_len != len) {
> +	if ((replacement_len != len) && array->nr)
>  		memmove(array->v + idx + replacement_len, array->v + idx + len,
>  			(array->nr - idx - len + 1) * sizeof(char *));
> -		array->nr += (replacement_len - len);
> -	}
> +	array->nr += replacement_len - len;

Why is this second block of changes needed? Will array-nr ever be 0 when
we reach here?

>  	for (size_t i = 0; i < replacement_len; i++)
>  		array->v[idx + i] = xstrdup(replacement[i]);
>  }

[snip]
Rubén Justo Dec. 4, 2024, 10:22 p.m. UTC | #8
On Wed, Dec 04, 2024 at 11:26:27AM +0000, karthik nayak wrote:

> Nit: Is the commit subject missing a verb?

I guess something like "To strvec_splice" sounded good in my head 
:)

> 
> > We use a singleton empty array to initialize a `struct strvec`,
> > similar to the empty string singleton we use to initialize a `struct
> > strbuf`.
> >
> 
> So a singleton empty array is a statically allocated array element, so
> for strvec, this would be `const char *empty_strvec[] = { NULL }`.
> 
> > Note that an empty strvec instance (with zero elements) does not
> > necessarily need to be an instance initialized with the singleton.
> > Let's refer to strvec instances initialized with the singleton as
> > "empty-singleton" instances.
> >
> 
> Right, so when we add elements ideally, we ideally check whether it is a
> singleton or not. This is evident in `strvec_push_nodup()`:
> 
>     void strvec_push_nodup(struct strvec *array, char *value)
>     {
>     	if (array->v == empty_strvec)
>     		array->v = NULL;
> 
>     	ALLOC_GROW(array->v, array->nr + 2, array->alloc);
>     	array->v[array->nr++] = value;
>     	array->v[array->nr] = NULL;
>     }
> 
> >     As a side note, this is the current `strvec_pop()`:
> >
> >     void strvec_pop(struct strvec *array)
> >     {
> >     	if (!array->nr)
> >     		return;
> >     	free((char *)array->v[array->nr - 1]);
> >     	array->v[array->nr - 1] = NULL;
> >     	array->nr--;
> >     }
> >
> >     So, with `strvec_pop()` an instance can become empty but it does
> >     not going to be the an "empty-singleton".
> 
> Correct, since we simply set the array element to NULL, but this is
> still a dynamically allocated array.
> 
> Nit: The sentence reads a bit weirdly.
> 
> > This "empty-singleton" circumstance requires us to be careful when
> > adding elements to instances.  Specifically, when adding the first
> > element:  we detach the strvec instance from the singleton and set the
> > internal pointer in the instance to NULL.  After this point we apply
> > `realloc()` on the pointer.  We do this in `strvec_push_nodup()`, for
> > example.
> >
> > The recently introduced `strvec_splice()` API is expected to be
> > normally used with non-empty strvec's.  However, it can also end up
> > being used with "empty-singleton" strvec's:
> >
> >        struct strvec arr = STRVEC_INIT;
> >        int a = 0, b = 0;
> >
> >        ... no modification to arr, a or b ...
> >
> >        const char *rep[] = { "foo" };
> >        strvec_splice(&arr, a, b, rep, ARRAY_SIZE(rep));
> >
> > So, we'll try to add elements to an "empty-singleton" strvec instance.
> >
> > Avoid misapplying `realloc()` to the singleton in `strvec_splice()` by
> > adding a special case for "empty-singleton" strvec's.
> >
> 
> So everything said here makes sense, that's a great explanation.

Thanks.

> 
> > Signed-off-by: Rubén Justo <rjusto@gmail.com>
> > ---
> >
> > This iteration adds more detail to the message plus a minor change to
> > remove some unnecessary parentheses.
> >
> > Junio: My message in the previous iteration was aimed at readers like
> > Patrick, who is also the author of `strvec_splice()`.  I certainly
> > assumed too much prior knowledge, which made the review unnecessarily
> > laborious.
> >
> > Rereading what I wrote last night, perhaps the problem now is excess.
> > I hope not. In any case, here it is :-)
> >
> 
> I would say this is very useful over the first iteration, considering I
> am someone without prior knowledge here.

I'm glad to read that.  I guess Junio is to blame ;)  Thanks.

> 
> > Thanks.
> >
> >  strvec.c              | 10 ++++++----
> >  t/unit-tests/strvec.c | 10 ++++++++++
> >  2 files changed, 16 insertions(+), 4 deletions(-)
> >
> > diff --git a/strvec.c b/strvec.c
> > index d1cf4e2496..087c020f5b 100644
> > --- a/strvec.c
> > +++ b/strvec.c
> > @@ -61,16 +61,18 @@ void strvec_splice(struct strvec *array, size_t idx, size_t len,
> >  {
> >  	if (idx + len > array->nr)
> >  		BUG("range outside of array boundary");
> > -	if (replacement_len > len)
> > +	if (replacement_len > len) {
> > +		if (array->v == empty_strvec)
> > +			array->v = NULL;
> >  		ALLOC_GROW(array->v, array->nr + (replacement_len - len) + 1,
> >  			   array->alloc);
> > +	}
> >  	for (size_t i = 0; i < len; i++)
> >  		free((char *)array->v[idx + i]);
> > -	if (replacement_len != len) {
> > +	if ((replacement_len != len) && array->nr)
> >  		memmove(array->v + idx + replacement_len, array->v + idx + len,
> >  			(array->nr - idx - len + 1) * sizeof(char *));
> > -		array->nr += (replacement_len - len);
> > -	}
> > +	array->nr += replacement_len - len;
> 
> Why is this second block of changes needed? Will array-nr ever be 0 when
> we reach here?

I'm not sure I understand your questions.

At that point, `array->nr` is the initial number of entries in the
vector.  It can be 0 when `strvec_splice()` is applied to an empty
vector.

We are moving the line where we update "array->nr" outside the `if`
block because we want to do it even when we are not moving existing
entries.  Again, this happens when `strvec_splice()` is applied to an
empty vector.

Finally, we don't mind too much (or value more the simplicity) of the
now unconditional update of "array->nr" because a clever compiler will
give us the third arm of the if: "else -> do nothing".  When
`replacement_len == len` => "array->nr += 0" => do nothing.

> 
> >  	for (size_t i = 0; i < replacement_len; i++)
> >  		array->v[idx + i] = xstrdup(replacement[i]);
> >  }
> 
> [snip]

Thank you for your review.
diff mbox series

Patch

diff --git a/strvec.c b/strvec.c
index d1cf4e2496..087c020f5b 100644
--- a/strvec.c
+++ b/strvec.c
@@ -61,16 +61,18 @@  void strvec_splice(struct strvec *array, size_t idx, size_t len,
 {
 	if (idx + len > array->nr)
 		BUG("range outside of array boundary");
-	if (replacement_len > len)
+	if (replacement_len > len) {
+		if (array->v == empty_strvec)
+			array->v = NULL;
 		ALLOC_GROW(array->v, array->nr + (replacement_len - len) + 1,
 			   array->alloc);
+	}
 	for (size_t i = 0; i < len; i++)
 		free((char *)array->v[idx + i]);
-	if (replacement_len != len) {
+	if ((replacement_len != len) && array->nr)
 		memmove(array->v + idx + replacement_len, array->v + idx + len,
 			(array->nr - idx - len + 1) * sizeof(char *));
-		array->nr += (replacement_len - len);
-	}
+	array->nr += replacement_len - len;
 	for (size_t i = 0; i < replacement_len; i++)
 		array->v[idx + i] = xstrdup(replacement[i]);
 }
diff --git a/t/unit-tests/strvec.c b/t/unit-tests/strvec.c
index 855b602337..e66b7bbfae 100644
--- a/t/unit-tests/strvec.c
+++ b/t/unit-tests/strvec.c
@@ -88,6 +88,16 @@  void test_strvec__pushv(void)
 	strvec_clear(&vec);
 }
 
+void test_strvec__splice_just_initialized_strvec(void)
+{
+	struct strvec vec = STRVEC_INIT;
+	const char *replacement[] = { "foo" };
+
+	strvec_splice(&vec, 0, 0, replacement, ARRAY_SIZE(replacement));
+	check_strvec(&vec, "foo", NULL);
+	strvec_clear(&vec);
+}
+
 void test_strvec__splice_with_same_size_replacement(void)
 {
 	struct strvec vec = STRVEC_INIT;