Message ID | 20250217-pks-update-ref-optimization-v1-7-a2b6d87a24af@pks.im (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | refs: batch refname availability checks | expand |
On Mon, Feb 17, 2025 at 04:50:21PM +0100, Patrick Steinhardt wrote: > The ref and reflog iterators have their lifecycle attached to iteration: > once the iterator reaches its end, it is automatically released and the > caller doesn't have to care about that anymore. When the iterator should > be released before it has been exhausted, callers must explicitly abort > the iterator via `ref_iterator_abort()`. > > This lifecycle is somewhat unusual in the Git codebase and creates two > problems: > > - Callsites need to be very careful about when exactly they call > `ref_iterator_abort()`, as calling the function is only valid when > the iterator itself still is. This leads to somewhat awkward calling > patterns in some situations. > In what situations, the iterator has been disappeared when we call `ref_iterator_abort`? Why is this awkward? When reading, I am really curious about this. > - It is impossible to reuse iterators and re-seek them to a different > prefix. This feature isn't supported by any iterator implementation > except for the reftable iterators anyway, but if it was implemented > it would allow us to optimize cases where we need to search for > specific references repeatedly by reusing internal state. > So, the reason why we cannot reuse the iterator is that we will deallocate the iterator? So, below we want to detangle the lifecycle. I don't know whether my understanding is correct. > Detangle the lifecycle from iteration so that we don't deallocate the > iterator anymore once it is exhausted. Instead, callers are now expected > to always call a newly introduce `ref_iterator_free()` function that > deallocates the iterator and its internal state. > A design question: why do not just introduce a variable, for example, `unsigned int free` to indicate whether we need to free the iterator? > While at it, drop the return value of `ref_iterator_abort()`, which > wasn't really required by any of the iterator implementations anyway. > Furthermore, stop calling `base_ref_iterator_free()` in any of the > backends, but instead call it in `ref_iterator_free()`. > > Signed-off-by: Patrick Steinhardt <ps@pks.im> > --- > builtin/clone.c | 2 + > dir-iterator.c | 24 +++++------ > dir-iterator.h | 13 ++---- > refs.c | 7 +++- > refs/debug.c | 9 ++--- > refs/files-backend.c | 36 +++++------------ > refs/iterator.c | 95 ++++++++++++++------------------------------ > refs/packed-backend.c | 27 ++++++------- > refs/ref-cache.c | 9 ++--- > refs/refs-internal.h | 31 +++++---------- > refs/reftable-backend.c | 34 ++++------------ > t/helper/test-dir-iterator.c | 1 + > 12 files changed, 99 insertions(+), 189 deletions(-) > > diff --git a/builtin/clone.c b/builtin/clone.c > index fd001d800c6..ac3e84b2b18 100644 > --- a/builtin/clone.c > +++ b/builtin/clone.c > @@ -426,6 +426,8 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest, > strbuf_setlen(src, src_len); > die(_("failed to iterate over '%s'"), src->buf); > } > + > + dir_iterator_free(iter); Here, we explicitly free the iterator. This is a must, because we will never free the iterator anymore. > } > > static void clone_local(const char *src_repo, const char *dest_repo) > diff --git a/dir-iterator.c b/dir-iterator.c > index de619846f29..857e1d9bdaf 100644 > --- a/dir-iterator.c > +++ b/dir-iterator.c > @@ -193,9 +193,9 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator) > > if (S_ISDIR(iter->base.st.st_mode) && push_level(iter)) { > if (errno != ENOENT && iter->flags & DIR_ITERATOR_PEDANTIC) > - goto error_out; > + return ITER_ERROR; Here, we just return `ITER_ERROR` instead of going `error_out` label which will call `dir_iterator_abort` function to free the iterator, we should make the caller free. > if (iter->levels_nr == 0) > - goto error_out; > + return ITER_ERROR; > } > > /* Loop until we find an entry that we can give back to the caller. */ > @@ -211,11 +211,11 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator) > int ret = next_directory_entry(level->dir, iter->base.path.buf, &de); > if (ret < 0) { > if (iter->flags & DIR_ITERATOR_PEDANTIC) > - goto error_out; > + return ITER_ERROR; > continue; > } else if (ret > 0) { > if (pop_level(iter) == 0) > - return dir_iterator_abort(dir_iterator); > + return ITER_DONE; Instead of calling `dir_iterator_abort`, we just return `ITER_DONE`. However, this does not make sense. We break the semantics of `ITER_DONE`. Let me cite the comment from "iterator.h": /* * The iterator is exhausted and has been freed. */ #define ITER_DONE -1 However, we don't free the iterator here. > continue; > } > > @@ -223,7 +223,7 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator) > } else { > if (level->entries_idx >= level->entries.nr) { > if (pop_level(iter) == 0) > - return dir_iterator_abort(dir_iterator); > + return ITER_DONE; > continue; > } > > @@ -232,22 +232,21 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator) > > if (prepare_next_entry_data(iter, name)) { > if (errno != ENOENT && iter->flags & DIR_ITERATOR_PEDANTIC) > - goto error_out; > + return ITER_ERROR; > continue; > } > > return ITER_OK; > } > - > -error_out: > - dir_iterator_abort(dir_iterator); > - return ITER_ERROR; > } > > -int dir_iterator_abort(struct dir_iterator *dir_iterator) > +void dir_iterator_free(struct dir_iterator *dir_iterator) We rename `dir_iterator_abort` to `dir_iterator_free`. And we drop the return value. > { > struct dir_iterator_int *iter = (struct dir_iterator_int *)dir_iterator; > > + if (!iter) > + return; > + Make sense, because we have no idea whether the `dir_iterator` is exhausted. So we need to check whether it is valid. > for (; iter->levels_nr; iter->levels_nr--) { > struct dir_iterator_level *level = > &iter->levels[iter->levels_nr - 1]; > @@ -266,7 +265,6 @@ int dir_iterator_abort(struct dir_iterator *dir_iterator) > free(iter->levels); > strbuf_release(&iter->base.path); > free(iter); > - return ITER_DONE; I am confused that why we don't provide `dir_iterator_release` for dir iterator? For other ref related iterators, we will rename "*abort" to "*release" and drop the operation to free the iterator. However, for dir iterator, we rename its "*abort" to "*free". I think this does not make sense. It causes inconsistency. Although dir iterator does _not_ have "peel" method. But it does have "advance" and "abort" methods just like ref iterators. I think you have already considered this problem. I guess that's the reason why in the below comment you typed `dir_iterator_release` instead of `dir_iterator_free`. > } > > struct dir_iterator *dir_iterator_begin(const char *path, unsigned int flags) > @@ -301,7 +299,7 @@ struct dir_iterator *dir_iterator_begin(const char *path, unsigned int flags) > return dir_iterator; > > error_out: > - dir_iterator_abort(dir_iterator); > + dir_iterator_free(dir_iterator); > errno = saved_errno; > return NULL; > } > diff --git a/dir-iterator.h b/dir-iterator.h > index 6d438809b6e..01f51f6bac1 100644 > --- a/dir-iterator.h > +++ b/dir-iterator.h > @@ -27,10 +27,8 @@ > * goto error_handler; > * > * while ((ok = dir_iterator_advance(iter)) == ITER_OK) { > - * if (want_to_stop_iteration()) { > - * ok = dir_iterator_abort(iter); > + * if (want_to_stop_iteration()) > * break; > - * } Is this correct? If `want_to_stop_iteration()` is true, the `ok` will be `ITER_OK` and then it breaks the loop to jump to check whether the `ok` is `ITER_DONE`. Of course it is not, it will call `handle_error`. At least, we should assign `ok = ITER_DONE`. > * > * // Access information about the current path: > * if (S_ISDIR(iter->st.st_mode)) > @@ -39,6 +37,7 @@ > * > * if (ok != ITER_DONE) > * handle_error(); > + * dir_iterator_release(iter); I think this is a typo. Should `dir_iterator_release` be `dir_iterator_free`? > * > * Callers are allowed to modify iter->path while they are working, > * but they must restore it to its original contents before calling > @@ -107,11 +106,7 @@ struct dir_iterator *dir_iterator_begin(const char *path, unsigned int flags); > */ > int dir_iterator_advance(struct dir_iterator *iterator); > > -/* > - * End the iteration before it has been exhausted. Free the > - * dir_iterator and any associated resources and return ITER_DONE. On > - * error, free the dir_iterator and return ITER_ERROR. > - */ > -int dir_iterator_abort(struct dir_iterator *iterator); > +/* Free the dir_iterator and any associated resources. */ > +void dir_iterator_free(struct dir_iterator *iterator); > > #endif > diff --git a/refs.c b/refs.c > index eaf41421f50..8eff60a2186 100644 > --- a/refs.c > +++ b/refs.c > @@ -2476,6 +2476,7 @@ int refs_verify_refnames_available(struct ref_store *refs, > { > struct strbuf dirname = STRBUF_INIT; > struct strbuf referent = STRBUF_INIT; > + struct ref_iterator *iter = NULL; > struct strset dirnames; > int ret = -1; > > @@ -2552,7 +2553,6 @@ int refs_verify_refnames_available(struct ref_store *refs, > strbuf_addch(&dirname, '/'); > > if (!initial_transaction) { > - struct ref_iterator *iter; > int ok; > > iter = refs_ref_iterator_begin(refs, dirname.buf, NULL, 0, > @@ -2564,12 +2564,14 @@ int refs_verify_refnames_available(struct ref_store *refs, > > strbuf_addf(err, _("'%s' exists; cannot create '%s'"), > iter->refname, refname); > - ref_iterator_abort(iter); > goto cleanup; > } > > if (ok != ITER_DONE) > BUG("error while iterating over references"); > + > + ref_iterator_free(iter); > + iter = NULL; > } > > extra_refname = find_descendant_ref(dirname.buf, extras, skip); > @@ -2586,6 +2588,7 @@ int refs_verify_refnames_available(struct ref_store *refs, > strbuf_release(&referent); > strbuf_release(&dirname); > strset_clear(&dirnames); > + ref_iterator_free(iter); > return ret; > } > > diff --git a/refs/debug.c b/refs/debug.c > index fbc4df08b43..a9786da4ba1 100644 > --- a/refs/debug.c > +++ b/refs/debug.c > @@ -179,19 +179,18 @@ static int debug_ref_iterator_peel(struct ref_iterator *ref_iterator, > return res; > } > > -static int debug_ref_iterator_abort(struct ref_iterator *ref_iterator) > +static void debug_ref_iterator_release(struct ref_iterator *ref_iterator) Here, we rename "*abort" to "*release" for debug ref and call the corresponding `release` method. > { > struct debug_ref_iterator *diter = > (struct debug_ref_iterator *)ref_iterator; > - int res = diter->iter->vtable->abort(diter->iter); > - trace_printf_key(&trace_refs, "iterator_abort: %d\n", res); > - return res; > + diter->iter->vtable->release(diter->iter); > + trace_printf_key(&trace_refs, "iterator_abort\n"); > } > > static struct ref_iterator_vtable debug_ref_iterator_vtable = { > .advance = debug_ref_iterator_advance, > .peel = debug_ref_iterator_peel, > - .abort = debug_ref_iterator_abort, > + .release = debug_ref_iterator_release, > }; > > static struct ref_iterator * > diff --git a/refs/files-backend.c b/refs/files-backend.c > index 29f08dced40..9511b6f3448 100644 > --- a/refs/files-backend.c > +++ b/refs/files-backend.c > @@ -919,10 +919,6 @@ static int files_ref_iterator_advance(struct ref_iterator *ref_iterator) > return ITER_OK; > } > > - iter->iter0 = NULL; > - if (ref_iterator_abort(ref_iterator) != ITER_DONE) > - ok = ITER_ERROR; > - We don't abort the iterator in `advance`. Is this because we want to reuse this iterator? > return ok; > } > [snip] > @@ -935,23 +931,17 @@ static int files_ref_iterator_peel(struct ref_iterator *ref_iterator, > return ref_iterator_peel(iter->iter0, peeled); > } > > -int ref_iterator_abort(struct ref_iterator *ref_iterator) > +void ref_iterator_free(struct ref_iterator *ref_iterator) > { > - return ref_iterator->vtable->abort(ref_iterator); > + if (ref_iterator) { > + ref_iterator->vtable->release(ref_iterator); > + /* Help make use-after-free bugs fail quickly: */ > + ref_iterator->vtable = NULL; > + free(ref_iterator); > + } > } > So, when calling `ref_iterator_free`, we will call corresponding "release" method to release the resources associated and then we free the iterator. Would this be too complicated? From my view, we could just make the `ref_iterator_abort` name unchanged but add a new variable such as "unsigned int free_iterator". And we change each "abort" callback to avoid free the iterator. This would be much simpler. [snip] > diff --git a/refs/ref-cache.c b/refs/ref-cache.c > index 02f09e4df88..6457e02c1ea 100644 > --- a/refs/ref-cache.c > +++ b/refs/ref-cache.c > @@ -409,7 +409,7 @@ static int cache_ref_iterator_advance(struct ref_iterator *ref_iterator) > if (++level->index == level->dir->nr) { > /* This level is exhausted; pop up a level */ > if (--iter->levels_nr == 0) > - return ref_iterator_abort(ref_iterator); > + return ITER_DONE; As I have said, simply return `ITER_DONE` breaks the semantics of the `ITER_DONE`. > > continue; > } > @@ -452,21 +452,18 @@ static int cache_ref_iterator_peel(struct ref_iterator *ref_iterator, > return peel_object(iter->repo, ref_iterator->oid, peeled) ? -1 : 0; > } > > -static int cache_ref_iterator_abort(struct ref_iterator *ref_iterator) > +static void cache_ref_iterator_release(struct ref_iterator *ref_iterator) > { > struct cache_ref_iterator *iter = > (struct cache_ref_iterator *)ref_iterator; > - > free((char *)iter->prefix); > free(iter->levels); > - base_ref_iterator_free(ref_iterator); > - return ITER_DONE; > } > > static struct ref_iterator_vtable cache_ref_iterator_vtable = { > .advance = cache_ref_iterator_advance, > .peel = cache_ref_iterator_peel, > - .abort = cache_ref_iterator_abort > + .release = cache_ref_iterator_release, > }; > > struct ref_iterator *cache_ref_iterator_begin(struct ref_cache *cache, > diff --git a/refs/refs-internal.h b/refs/refs-internal.h > index aaab711bb96..27ff822cf43 100644 > --- a/refs/refs-internal.h > +++ b/refs/refs-internal.h > @@ -273,11 +273,11 @@ enum do_for_each_ref_flags { > * the next reference and returns ITER_OK. The data pointed at by > * refname and oid belong to the iterator; if you want to retain them > * after calling ref_iterator_advance() again or calling > - * ref_iterator_abort(), you must make a copy. When the iteration has > + * ref_iterator_free(), you must make a copy. When the iteration has > * been exhausted, ref_iterator_advance() releases any resources > * associated with the iteration, frees the ref_iterator object, and > * returns ITER_DONE. If you want to abort the iteration early, call > - * ref_iterator_abort(), which also frees the ref_iterator object and > + * ref_iterator_free(), which also frees the ref_iterator object and > * any associated resources. If there was an internal error advancing > * to the next entry, ref_iterator_advance() aborts the iteration, > * frees the ref_iterator, and returns ITER_ERROR. > @@ -292,10 +292,8 @@ enum do_for_each_ref_flags { > * struct ref_iterator *iter = ...; > * > * while ((ok = ref_iterator_advance(iter)) == ITER_OK) { > - * if (want_to_stop_iteration()) { > - * ok = ref_iterator_abort(iter); > + * if (want_to_stop_iteration()) > * break; > - * } I also think here we have problem, at least we should set `ok = ITER_DONE`. > * > * // Access information about the current reference: > * if (!(iter->flags & REF_ISSYMREF)) > @@ -307,6 +305,7 @@ enum do_for_each_ref_flags { > * > * if (ok != ITER_DONE) > * handle_error(); > + * ref_iterator_free(iter); > */ > struct ref_iterator { > struct ref_iterator_vtable *vtable; > @@ -333,12 +332,8 @@ int ref_iterator_advance(struct ref_iterator *ref_iterator); > int ref_iterator_peel(struct ref_iterator *ref_iterator, > struct object_id *peeled); > > -/* > - * End the iteration before it has been exhausted, freeing the > - * reference iterator and any associated resources and returning > - * ITER_DONE. If the abort itself failed, return ITER_ERROR. > - */ > -int ref_iterator_abort(struct ref_iterator *ref_iterator); > +/* Free the reference iterator and any associated resources. */ > +void ref_iterator_free(struct ref_iterator *ref_iterator); > > /* > * An iterator over nothing (its first ref_iterator_advance() call > @@ -438,13 +433,6 @@ struct ref_iterator *prefix_ref_iterator_begin(struct ref_iterator *iter0, > void base_ref_iterator_init(struct ref_iterator *iter, > struct ref_iterator_vtable *vtable); > > -/* > - * Base class destructor for ref_iterators. Destroy the ref_iterator > - * part of iter and shallow-free the object. This is meant to be > - * called only by the destructors of derived classes. > - */ > -void base_ref_iterator_free(struct ref_iterator *iter); > - OK, here we delete `base_ref_iterator_free`. Because we will use `ref_iterator_free`. > /* Virtual function declarations for ref_iterators: */ > > /* > @@ -463,15 +451,14 @@ typedef int ref_iterator_peel_fn(struct ref_iterator *ref_iterator, > > /* > * Implementations of this function should free any resources specific > - * to the derived class, then call base_ref_iterator_free() to clean > - * up and free the ref_iterator object. > + * to the derived class. > */ > -typedef int ref_iterator_abort_fn(struct ref_iterator *ref_iterator); > +typedef void ref_iterator_release_fn(struct ref_iterator *ref_iterator); So here why we name this function to `release` is that we want to release any resource related to iterator but we don't free iterator itself. [snip] --- Some my thinking after reading this whole patch: 1. We somehow misuses the "ITER_DONE". We either need to adjust its meaning (we just delete the iterator is freed part) or introduces a new state to represent that the iteration is done but we don't release its resource and free the iterator itself. 2. I don't think we need to make things complicated. From my understanding, the motivation here is that we don't want to `advance` callback to call `abort` callback. I want to ask an _important_ question here: what is the motivation we rename `abort` to `release` in the first place? As far as I know, we only call this callback in the newly created "ref_iterator_free". Although release may be more accurate, this change truly causes overhead of this patch. 3. If the motivation is that we don't want to `advance` callback to call `abort` callback. I think we could just let the user call `abort` callback for the following two situations: 1. We have exhausted the iteration. It returns `ITER_OK`. 2. We encountered the error, it returns `ITER_ERROR`. And we give the freedom to the caller. It's their duty to call `ref_iterator_abort` which cleans the resource and free the iterator. Writing here, I have always thought that there is a situation that we just want to release the resources but not want to free the iterator itself. That's why I am wondering why just add a new variable to do. However, if we just want to make the lifecycle out, we just delete each "abort" code where it frees the iterator. And we free the iterator in the "ref_iterator_abort". Should this be enough? Thanks, Jialuo
Patrick Steinhardt <ps@pks.im> writes: > The ref and reflog iterators have their lifecycle attached to iteration: > once the iterator reaches its end, it is automatically released and the > caller doesn't have to care about that anymore. When the iterator should > be released before it has been exhausted, callers must explicitly abort > the iterator via `ref_iterator_abort()`. > > This lifecycle is somewhat unusual in the Git codebase and creates two > problems: > > - Callsites need to be very careful about when exactly they call > `ref_iterator_abort()`, as calling the function is only valid when > the iterator itself still is. This leads to somewhat awkward calling > patterns in some situations. > > - It is impossible to reuse iterators and re-seek them to a different > prefix. This feature isn't supported by any iterator implementation > except for the reftable iterators anyway, but if it was implemented > it would allow us to optimize cases where we need to search for > specific references repeatedly by reusing internal state. > > Detangle the lifecycle from iteration so that we don't deallocate the > iterator anymore once it is exhausted. Instead, callers are now expected > to always call a newly introduce `ref_iterator_free()` function that > deallocates the iterator and its internal state. > > While at it, drop the return value of `ref_iterator_abort()`, which > wasn't really required by any of the iterator implementations anyway. > Furthermore, stop calling `base_ref_iterator_free()` in any of the > backends, but instead call it in `ref_iterator_free()`. > > Signed-off-by: Patrick Steinhardt <ps@pks.im> > --- > builtin/clone.c | 2 + > dir-iterator.c | 24 +++++------ > dir-iterator.h | 13 ++---- > refs.c | 7 +++- > refs/debug.c | 9 ++--- > refs/files-backend.c | 36 +++++------------ > refs/iterator.c | 95 ++++++++++++++------------------------------ > refs/packed-backend.c | 27 ++++++------- > refs/ref-cache.c | 9 ++--- > refs/refs-internal.h | 31 +++++---------- > refs/reftable-backend.c | 34 ++++------------ > t/helper/test-dir-iterator.c | 1 + > 12 files changed, 99 insertions(+), 189 deletions(-) > > diff --git a/builtin/clone.c b/builtin/clone.c > index fd001d800c6..ac3e84b2b18 100644 > --- a/builtin/clone.c > +++ b/builtin/clone.c > @@ -426,6 +426,8 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest, > strbuf_setlen(src, src_len); > die(_("failed to iterate over '%s'"), src->buf); > } > + > + dir_iterator_free(iter); > } > A bit puzzled to see `dir_iterator_*` change here, I'm assuming it's linked to the 'files-backend' and perhaps similar to the changes mentioned about `ref_iterator_*` in the commit message. Would be nice to call out in the commit message too. [snip] > @@ -223,7 +223,7 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator) > } else { > if (level->entries_idx >= level->entries.nr) { > if (pop_level(iter) == 0) > - return dir_iterator_abort(dir_iterator); > + return ITER_DONE; > continue; > } > > @@ -232,22 +232,21 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator) > > if (prepare_next_entry_data(iter, name)) { > if (errno != ENOENT && iter->flags & DIR_ITERATOR_PEDANTIC) > - goto error_out; > + return ITER_ERROR; > continue; > } > > return ITER_OK; > } > - > -error_out: > - dir_iterator_abort(dir_iterator); > - return ITER_ERROR; > } Okay yeah, we're getting rid of `dir_iterator_abort` so potentially add `dir_iterator_free` below > > -int dir_iterator_abort(struct dir_iterator *dir_iterator) > +void dir_iterator_free(struct dir_iterator *dir_iterator) > { > struct dir_iterator_int *iter = (struct dir_iterator_int *)dir_iterator; > > + if (!iter) > + return; > + > for (; iter->levels_nr; iter->levels_nr--) { > struct dir_iterator_level *level = > &iter->levels[iter->levels_nr - 1]; > @@ -266,7 +265,6 @@ int dir_iterator_abort(struct dir_iterator *dir_iterator) > free(iter->levels); > strbuf_release(&iter->base.path); > free(iter); > - return ITER_DONE; > } > > struct dir_iterator *dir_iterator_begin(const char *path, unsigned int flags) Okay this makes sense! [snip] > diff --git a/refs/files-backend.c b/refs/files-backend.c > index 29f08dced40..9511b6f3448 100644 > --- a/refs/files-backend.c > +++ b/refs/files-backend.c > @@ -919,10 +919,6 @@ static int files_ref_iterator_advance(struct ref_iterator *ref_iterator) > return ITER_OK; > } > > - iter->iter0 = NULL; > - if (ref_iterator_abort(ref_iterator) != ITER_DONE) > - ok = ITER_ERROR; > - > Since we're explicitly going to call `ref_iterator_free`, this makes sense. > return ok; > } > > @@ -935,23 +931,17 @@ static int files_ref_iterator_peel(struct ref_iterator *ref_iterator, > return ref_iterator_peel(iter->iter0, peeled); > } > > -static int files_ref_iterator_abort(struct ref_iterator *ref_iterator) > +static void files_ref_iterator_release(struct ref_iterator *ref_iterator) > { > struct files_ref_iterator *iter = > (struct files_ref_iterator *)ref_iterator; > - int ok = ITER_DONE; > - > - if (iter->iter0) > - ok = ref_iterator_abort(iter->iter0); > - > - base_ref_iterator_free(ref_iterator); > - return ok; > + ref_iterator_free(iter->iter0); > } > I like how much more cleaner it looks now. [snip] > diff --git a/refs/packed-backend.c b/refs/packed-backend.c > index a7b6f74b6e3..38a1956d1a8 100644 > --- a/refs/packed-backend.c > +++ b/refs/packed-backend.c > @@ -954,9 +954,6 @@ static int packed_ref_iterator_advance(struct ref_iterator *ref_iterator) > return ITER_OK; > } > > - if (ref_iterator_abort(ref_iterator) != ITER_DONE) > - ok = ITER_ERROR; > - > return ok; > } > The merged_iterator is used to combine the files and packed backend iterators to provide a uniform view over them. Likewise the changes here seem similar too. > @@ -976,23 +973,19 @@ static int packed_ref_iterator_peel(struct ref_iterator *ref_iterator, > } > } > > -static int packed_ref_iterator_abort(struct ref_iterator *ref_iterator) > +static void packed_ref_iterator_release(struct ref_iterator *ref_iterator) > { > struct packed_ref_iterator *iter = > (struct packed_ref_iterator *)ref_iterator; > - int ok = ITER_DONE; > - > strbuf_release(&iter->refname_buf); > free(iter->jump); > release_snapshot(iter->snapshot); > - base_ref_iterator_free(ref_iterator); > - return ok; > } > > static struct ref_iterator_vtable packed_ref_iterator_vtable = { > .advance = packed_ref_iterator_advance, > .peel = packed_ref_iterator_peel, > - .abort = packed_ref_iterator_abort > + .release = packed_ref_iterator_release, > }; > > static int jump_list_entry_cmp(const void *va, const void *vb) > @@ -1362,8 +1355,10 @@ static int write_with_updates(struct packed_ref_store *refs, > */ > iter = packed_ref_iterator_begin(&refs->base, "", NULL, > DO_FOR_EACH_INCLUDE_BROKEN); > - if ((ok = ref_iterator_advance(iter)) != ITER_OK) > + if ((ok = ref_iterator_advance(iter)) != ITER_OK) { > + ref_iterator_free(iter); Nit: Since we don't return early here, wouldn't the `ref_iterator_free` at the end of the function be sufficient? I think the only early return when `iter == NULL` is towards the end of the function, where it might be better to add a `goto error`. [snip]
On Wed, Feb 19, 2025 at 12:52:23AM +0800, shejialuo wrote: > On Mon, Feb 17, 2025 at 04:50:21PM +0100, Patrick Steinhardt wrote: > > The ref and reflog iterators have their lifecycle attached to iteration: > > once the iterator reaches its end, it is automatically released and the > > caller doesn't have to care about that anymore. When the iterator should > > be released before it has been exhausted, callers must explicitly abort > > the iterator via `ref_iterator_abort()`. > > > > This lifecycle is somewhat unusual in the Git codebase and creates two > > problems: > > > > - Callsites need to be very careful about when exactly they call > > `ref_iterator_abort()`, as calling the function is only valid when > > the iterator itself still is. This leads to somewhat awkward calling > > patterns in some situations. > > > > In what situations, the iterator has been disappeared when we call > `ref_iterator_abort`? Why is this awkward? When reading, I am really > curious about this. It leads to patterns where you have to call `ref_iterator_abort()`, but only in a subset of code paths. You need to be really careful about the lifetime of the iterator, and I had several occasions where I was doing the wrong thing because I missed this subtlety. Compared to that, it is trivial (and less code as demonstrated by the patch) to unconditionally call `ref_iterator_free()`. > > - It is impossible to reuse iterators and re-seek them to a different > > prefix. This feature isn't supported by any iterator implementation > > except for the reftable iterators anyway, but if it was implemented > > it would allow us to optimize cases where we need to search for > > specific references repeatedly by reusing internal state. > > > > So, the reason why we cannot reuse the iterator is that we will > deallocate the iterator? So, below we want to detangle the lifecycle. I > don't know whether my understanding is correct. Yup, your understanding is correct. A deallocated iterator cannot be reused. > > Detangle the lifecycle from iteration so that we don't deallocate the > > iterator anymore once it is exhausted. Instead, callers are now expected > > to always call a newly introduce `ref_iterator_free()` function that > > deallocates the iterator and its internal state. > > > > A design question: why do not just introduce a variable, for example, > `unsigned int free` to indicate whether we need to free the iterator? Because that would make the code even more subtle than it already is. Doing things unconditionally is always easier to reason about than doing them conditionally. > > diff --git a/dir-iterator.c b/dir-iterator.c > > index de619846f29..857e1d9bdaf 100644 > > --- a/dir-iterator.c > > +++ b/dir-iterator.c > > @@ -211,11 +211,11 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator) > > int ret = next_directory_entry(level->dir, iter->base.path.buf, &de); > > if (ret < 0) { > > if (iter->flags & DIR_ITERATOR_PEDANTIC) > > - goto error_out; > > + return ITER_ERROR; > > continue; > > } else if (ret > 0) { > > if (pop_level(iter) == 0) > > - return dir_iterator_abort(dir_iterator); > > + return ITER_DONE; > > Instead of calling `dir_iterator_abort`, we just return `ITER_DONE`. > However, this does not make sense. We break the semantics of > `ITER_DONE`. Let me cite the comment from "iterator.h": > > /* > * The iterator is exhausted and has been freed. > */ > #define ITER_DONE -1 > > However, we don't free the iterator here. Oh, yeah, I'll have to adapt that comment, thanks for pointing it out. > > @@ -266,7 +265,6 @@ int dir_iterator_abort(struct dir_iterator *dir_iterator) > > free(iter->levels); > > strbuf_release(&iter->base.path); > > free(iter); > > - return ITER_DONE; > > I am confused that why we don't provide `dir_iterator_release` for dir > iterator? For other ref related iterators, we will rename "*abort" to > "*release" and drop the operation to free the iterator. However, for dir > iterator, we rename its "*abort" to "*free". > > I think this does not make sense. It causes inconsistency. Although dir > iterator does _not_ have "peel" method. But it does have "advance" and > "abort" methods just like ref iterators. > > I think you have already considered this problem. I guess that's the > reason why in the below comment you typed `dir_iterator_release` instead > of `dir_iterator_free`. The `dir_iterator` itself is already being inconsistent with every other iterator that we have because it is not a `ref_iterator`. It is _used_ to implement ref iterators, but doesn't provide the same interface. And because of that we cannot rely on `ref_iterator_free()` to free it, but must instead provide `dir_iterator_free()` to do so. I have amended the commit message to explain why this one is special. > > diff --git a/dir-iterator.h b/dir-iterator.h > > index 6d438809b6e..01f51f6bac1 100644 > > --- a/dir-iterator.h > > +++ b/dir-iterator.h > > @@ -27,10 +27,8 @@ > > * goto error_handler; > > * > > * while ((ok = dir_iterator_advance(iter)) == ITER_OK) { > > - * if (want_to_stop_iteration()) { > > - * ok = dir_iterator_abort(iter); > > + * if (want_to_stop_iteration()) > > * break; > > - * } > > Is this correct? If `want_to_stop_iteration()` is true, the `ok` will be > `ITER_OK` and then it breaks the loop to jump to check whether the `ok` > is `ITER_DONE`. Of course it is not, it will call `handle_error`. At > least, we should assign `ok = ITER_DONE`. True, fixed. > > * > > * // Access information about the current path: > > * if (S_ISDIR(iter->st.st_mode)) > > @@ -39,6 +37,7 @@ > > * > > * if (ok != ITER_DONE) > > * handle_error(); > > + * dir_iterator_release(iter); > > I think this is a typo. Should `dir_iterator_release` be > `dir_iterator_free`? Yup, fixed. > > diff --git a/refs/files-backend.c b/refs/files-backend.c > > index 29f08dced40..9511b6f3448 100644 > > --- a/refs/files-backend.c > > +++ b/refs/files-backend.c > > @@ -919,10 +919,6 @@ static int files_ref_iterator_advance(struct ref_iterator *ref_iterator) > > return ITER_OK; > > } > > > > - iter->iter0 = NULL; > > - if (ref_iterator_abort(ref_iterator) != ITER_DONE) > > - ok = ITER_ERROR; > > - > > We don't abort the iterator in `advance`. Is this because we want to > reuse this iterator? Exactly, this is basically the gist of this whole patch. > > @@ -935,23 +931,17 @@ static int files_ref_iterator_peel(struct ref_iterator *ref_iterator, > > return ref_iterator_peel(iter->iter0, peeled); > > } > > > > -int ref_iterator_abort(struct ref_iterator *ref_iterator) > > +void ref_iterator_free(struct ref_iterator *ref_iterator) > > { > > - return ref_iterator->vtable->abort(ref_iterator); > > + if (ref_iterator) { > > + ref_iterator->vtable->release(ref_iterator); > > + /* Help make use-after-free bugs fail quickly: */ > > + ref_iterator->vtable = NULL; > > + free(ref_iterator); > > + } > > } > > > > So, when calling `ref_iterator_free`, we will call corresponding > "release" method to release the resources associated and then we free > the iterator. Would this be too complicated? From my view, we could just > make the `ref_iterator_abort` name unchanged but add a new variable such > as "unsigned int free_iterator". And we change each "abort" callback to > avoid free the iterator. This would be much simpler. I think adding a separate variable to track whether or not things should be freed would make things way more complicated. I would claim the opposite: the fact that the patch removes 100 lines of code demonstrates quite neatly that the new design is way simpler and needs less logic. It's also simpler to reason about from my perspective: you allocate an iterator, you free it. No conditionals, no nothing. > [snip] > > > diff --git a/refs/ref-cache.c b/refs/ref-cache.c > > index 02f09e4df88..6457e02c1ea 100644 > > --- a/refs/ref-cache.c > > +++ b/refs/ref-cache.c > > @@ -409,7 +409,7 @@ static int cache_ref_iterator_advance(struct ref_iterator *ref_iterator) > > if (++level->index == level->dir->nr) { > > /* This level is exhausted; pop up a level */ > > if (--iter->levels_nr == 0) > > - return ref_iterator_abort(ref_iterator); > > + return ITER_DONE; > > As I have said, simply return `ITER_DONE` breaks the semantics of the > `ITER_DONE`. It doesn't: `ITER_DONE` indicates that the iterator has been exhausted. The current comment is stale though, as it claims that the iterator will also have been free'd. > > diff --git a/refs/refs-internal.h b/refs/refs-internal.h > > index aaab711bb96..27ff822cf43 100644 > > --- a/refs/refs-internal.h > > +++ b/refs/refs-internal.h > > @@ -292,10 +292,8 @@ enum do_for_each_ref_flags { > > * struct ref_iterator *iter = ...; > > * > > * while ((ok = ref_iterator_advance(iter)) == ITER_OK) { > > - * if (want_to_stop_iteration()) { > > - * ok = ref_iterator_abort(iter); > > + * if (want_to_stop_iteration()) > > * break; > > - * } > > I also think here we have problem, at least we should set `ok = > ITER_DONE`. Yup, same as in the other comment indeed. > Some my thinking after reading this whole patch: > > 1. We somehow misuses the "ITER_DONE". We either need to adjust its > meaning (we just delete the iterator is freed part) or introduces a new > state to represent that the iteration is done but we don't release its > resource and free the iterator itself. Yup, addressed. > 2. I don't think we need to make things complicated. From my > understanding, the motivation here is that we don't want to `advance` > callback to call `abort` callback. I want to ask an _important_ question > here: what is the motivation we rename `abort` to `release` in the first > place? As far as I know, we only call this callback in the newly created > "ref_iterator_free". Although release may be more accurate, this change > truly causes overhead of this patch. We have to touch up all of these functions anyway, so renaming them at the same point doesn't feel like it adds any more complexity. > 3. If the motivation is that we don't want to `advance` callback to call > `abort` callback. I think we could just let the user call `abort` > callback for the following two situations: > > 1. We have exhausted the iteration. It returns `ITER_OK`. This is impossible because the caller wouldn't be able to discern an exhausted iterator from an iterator that still has entries. We have to discern `ITER_OK` and `ITER_DONE`. > 2. We encountered the error, it returns `ITER_ERROR`. Yup. > And we give the freedom to the caller. It's their duty to call > `ref_iterator_abort` which cleans the resource and free the iterator. Yup. > Writing here, I have always thought that there is a situation that we > just want to release the resources but not want to free the iterator > itself. That's why I am wondering why just add a new variable to do. > However, if we just want to make the lifecycle out, we just delete each > "abort" code where it frees the iterator. And we free the iterator in > the "ref_iterator_abort". Should this be enough? As mentioned, I don't think a new variable would lead to a simplified architecture. With re-seekable iterators the caller is the one who needs to control whether the iterator should or should not be free'd, as they are the only one who knows whether they want to reuse it. So making it the responsibility of the callers to release the iterator is the proper way to do it. I also don't quite see the complexity argument. The patch rather clearly shows that we're _reducing_ complexity as it allows us to drop around a 100 lines of code. We can stop worrying about whether or not we have to call `ref_iterator_abort()` and don't have to worry about any conditions that leak from the iterator subsystem to the callers. We just free the iterator once we're done with it and call it a day. Thanks for your input! Patrick
On Tue, Feb 18, 2025 at 09:13:55AM -0800, Karthik Nayak wrote: > Patrick Steinhardt <ps@pks.im> writes: > > diff --git a/builtin/clone.c b/builtin/clone.c > > index fd001d800c6..ac3e84b2b18 100644 > > --- a/builtin/clone.c > > +++ b/builtin/clone.c > > @@ -426,6 +426,8 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest, > > strbuf_setlen(src, src_len); > > die(_("failed to iterate over '%s'"), src->buf); > > } > > + > > + dir_iterator_free(iter); > > } > > > > A bit puzzled to see `dir_iterator_*` change here, I'm assuming it's > linked to the 'files-backend' and perhaps similar to the changes > mentioned about `ref_iterator_*` in the commit message. Would be nice to > call out in the commit message too. Yeah, that's the reason. I've added a note to the commit message. > > diff --git a/refs/packed-backend.c b/refs/packed-backend.c > > index a7b6f74b6e3..38a1956d1a8 100644 > > --- a/refs/packed-backend.c > > +++ b/refs/packed-backend.c > > @@ -1362,8 +1355,10 @@ static int write_with_updates(struct packed_ref_store *refs, > > */ > > iter = packed_ref_iterator_begin(&refs->base, "", NULL, > > DO_FOR_EACH_INCLUDE_BROKEN); > > - if ((ok = ref_iterator_advance(iter)) != ITER_OK) > > + if ((ok = ref_iterator_advance(iter)) != ITER_OK) { > > + ref_iterator_free(iter); > > Nit: Since we don't return early here, wouldn't the `ref_iterator_free` > at the end of the function be sufficient? I think the only early return > when `iter == NULL` is towards the end of the function, where it might > be better to add a `goto error`. Yeah, the code here could definitely be improved with the new semantics. But I was aiming to keep the required refactoring work at callsites to the bare minimum, so I'd rather prefer to keep this unchanged. Patrick
On Wed, Feb 19, 2025 at 12:52:06PM +0100, Patrick Steinhardt wrote: > > > @@ -935,23 +931,17 @@ static int files_ref_iterator_peel(struct ref_iterator *ref_iterator, > > > return ref_iterator_peel(iter->iter0, peeled); > > > } > > > > > > -int ref_iterator_abort(struct ref_iterator *ref_iterator) > > > +void ref_iterator_free(struct ref_iterator *ref_iterator) > > > { > > > - return ref_iterator->vtable->abort(ref_iterator); > > > + if (ref_iterator) { > > > + ref_iterator->vtable->release(ref_iterator); > > > + /* Help make use-after-free bugs fail quickly: */ > > > + ref_iterator->vtable = NULL; > > > + free(ref_iterator); > > > + } > > > } > > > > > > > So, when calling `ref_iterator_free`, we will call corresponding > > "release" method to release the resources associated and then we free > > the iterator. Would this be too complicated? From my view, we could just > > make the `ref_iterator_abort` name unchanged but add a new variable such > > as "unsigned int free_iterator". And we change each "abort" callback to > > avoid free the iterator. This would be much simpler. > > I think adding a separate variable to track whether or not things should > be freed would make things way more complicated. I would claim the > opposite: the fact that the patch removes 100 lines of code demonstrates > quite neatly that the new design is way simpler and needs less logic. > I agree with you with the current implementation, we don't need to worry about calling "abort" callback in "advance". And the logic is simpler. When writing this comment, I have thought that there is a situation that we just want to call "release" callback. So, that's the reason why I have asked you why not just add a new variable. However, we don't call "release" callback expect in `ref_iterator_free`. [snip] > > 2. I don't think we need to make things complicated. From my > > understanding, the motivation here is that we don't want to `advance` > > callback to call `abort` callback. I want to ask an _important_ question > > here: what is the motivation we rename `abort` to `release` in the first > > place? As far as I know, we only call this callback in the newly created > > "ref_iterator_free". Although release may be more accurate, this change > > truly causes overhead of this patch. > > We have to touch up all of these functions anyway, so renaming them at > the same point doesn't feel like it adds any more complexity. > I will explain this in the later. [snip] > > Writing here, I have always thought that there is a situation that we > > just want to release the resources but not want to free the iterator > > itself. That's why I am wondering why just add a new variable to do. > > However, if we just want to make the lifecycle out, we just delete each > > "abort" code where it frees the iterator. And we free the iterator in > > the "ref_iterator_abort". Should this be enough? > > As mentioned, I don't think a new variable would lead to a simplified > architecture. With re-seekable iterators the caller is the one who needs > to control whether the iterator should or should not be free'd, as they > are the only one who knows whether they want to reuse it. So making it > the responsibility of the callers to release the iterator is the proper > way to do it. > Yes, actually I think you have misunderstood my meaning here. I have abandoned the idea to add a new variable when writing here. After reading through the patch, I know that your motivation. My point is "However, ..." part. > I also don't quite see the complexity argument. The patch rather clearly > shows that we're _reducing_ complexity as it allows us to drop around a > 100 lines of code. We can stop worrying about whether or not we have to > call `ref_iterator_abort()` and don't have to worry about any conditions > that leak from the iterator subsystem to the callers. We just free the > iterator once we're done with it and call it a day. > Yes, I agree with you that we truly reduce complexity here. And as you have said, the user allocate the iterator and free the iterator. With this, we make call sequence clearer. But there is one thing I want to argue with. I don't think we need to rename "abort" callback to "release" and also "ref_iterator_abort" to "ref_iterator_free" for the following reasons: 1. We never call "release" expect in the "ref_iterator_free" function. For other exposed functions "ref_iterator_advance", "ref_iterator_peel" and the original "ref_iterator_abort". We will just call the registered callback "advance", "peel" or "abort" via virtual table. I somehow think we should follow this pattern. But I don't know actually. 2. When I read the patch yesterday, I really wonder what is the difference between "release" and "free". Why do we only change the "ref_iterator_abort" to "ref_iterator_free" but for the callback, we rename "abort" to "release". I know that you want to distinguish to emphasis that we won't free the iterator but only release its resource for ref iterator. But could abort also mean this? Thanks, Jialuo
On Wed, Feb 19, 2025 at 08:41:03PM +0800, shejialuo wrote: > But there is one thing I want to argue with. I don't think we need to > rename "abort" callback to "release" and also "ref_iterator_abort" to > "ref_iterator_free" for the following reasons: > > 1. We never call "release" expect in the "ref_iterator_free" function. > For other exposed functions "ref_iterator_advance", "ref_iterator_peel" > and the original "ref_iterator_abort". We will just call the registered > callback "advance", "peel" or "abort" via virtual table. I somehow think > we should follow this pattern. But I don't know actually. > 2. When I read the patch yesterday, I really wonder what is the > difference between "release" and "free". Why do we only change the > "ref_iterator_abort" to "ref_iterator_free" but for the callback, we > rename "abort" to "release". I know that you want to distinguish to > emphasis that we won't free the iterator but only release its resource > for ref iterator. But could abort also mean this? The difference between "release" and "free" is explicitly documented in our CodingGuidelines. Quoting the relevant parts: - `S_release()` releases a structure's contents without freeing the structure. - `S_free()` releases a structure's contents and frees the structure. So following these coding guidelines, we have to call the underlying implementations that are specific to the iterators `release()` because they don't free the iterator itself. And because the generic part _does_ free the iterator itself in addition to releasing its state, it has to be called `free()`. Regarding the question why to even rename `ref_iterator_abort()` itself: this is done to avoid confusion going forward. Previously it really only had to be called when you actually wanted to abort an ongoing iteration over its yielded references. This is not the case anymore, and now you have to call it unconditionally after you're done with the iterator. So while the naming previously made sense, now it doesn't anymore. Patrick
On Wed, Feb 19, 2025 at 01:59:13PM +0100, Patrick Steinhardt wrote: > On Wed, Feb 19, 2025 at 08:41:03PM +0800, shejialuo wrote: > > But there is one thing I want to argue with. I don't think we need to > > rename "abort" callback to "release" and also "ref_iterator_abort" to > > "ref_iterator_free" for the following reasons: > > > > 1. We never call "release" expect in the "ref_iterator_free" function. > > For other exposed functions "ref_iterator_advance", "ref_iterator_peel" > > and the original "ref_iterator_abort". We will just call the registered > > callback "advance", "peel" or "abort" via virtual table. I somehow think > > we should follow this pattern. But I don't know actually. > > 2. When I read the patch yesterday, I really wonder what is the > > difference between "release" and "free". Why do we only change the > > "ref_iterator_abort" to "ref_iterator_free" but for the callback, we > > rename "abort" to "release". I know that you want to distinguish to > > emphasis that we won't free the iterator but only release its resource > > for ref iterator. But could abort also mean this? > > The difference between "release" and "free" is explicitly documented in > our CodingGuidelines. Quoting the relevant parts: > > - `S_release()` releases a structure's contents without freeing the > structure. > > - `S_free()` releases a structure's contents and frees the > structure. > > So following these coding guidelines, we have to call the underlying > implementations that are specific to the iterators `release()` because > they don't free the iterator itself. And because the generic part _does_ > free the iterator itself in addition to releasing its state, it has to > be called `free()`. > Make sense. > Regarding the question why to even rename `ref_iterator_abort()` itself: > this is done to avoid confusion going forward. Previously it really only > had to be called when you actually wanted to abort an ongoing iteration > over its yielded references. This is not the case anymore, and now you > have to call it unconditionally after you're done with the iterator. So > while the naming previously made sense, now it doesn't anymore. > Good point, I didn't realise this part. Thanks for the detailed explanation. I will continue to review the later patches. However, I won't touch the oid part, because I am not familiar with this. By the way, I think we miss out one thing in this patch: We forget to free the dir iterator defined in the "files-backend.c::files_fsck_refs_dir". I have just remembered that I use dir iterator when checking the ref consistency. Thanks, Jialuo > Patrick
On Wed, Feb 19, 2025 at 09:06:58PM +0800, shejialuo wrote: > On Wed, Feb 19, 2025 at 01:59:13PM +0100, Patrick Steinhardt wrote: > > Regarding the question why to even rename `ref_iterator_abort()` itself: > > this is done to avoid confusion going forward. Previously it really only > > had to be called when you actually wanted to abort an ongoing iteration > > over its yielded references. This is not the case anymore, and now you > > have to call it unconditionally after you're done with the iterator. So > > while the naming previously made sense, now it doesn't anymore. > > > > Good point, I didn't realise this part. Thanks for the detailed > explanation. I will continue to review the later patches. However, I > won't touch the oid part, because I am not familiar with this. By the > way, I think we miss out one thing in this patch: > > We forget to free the dir iterator defined in the > "files-backend.c::files_fsck_refs_dir". I have just remembered that I > use dir iterator when checking the ref consistency. Hm, good point. Why doesn't CI complain about this leak...? I'll investigate, thanks for the hint! Patrick
On Wed, Feb 19, 2025 at 02:17:07PM +0100, Patrick Steinhardt wrote: > On Wed, Feb 19, 2025 at 09:06:58PM +0800, shejialuo wrote: > > On Wed, Feb 19, 2025 at 01:59:13PM +0100, Patrick Steinhardt wrote: > > > Regarding the question why to even rename `ref_iterator_abort()` itself: > > > this is done to avoid confusion going forward. Previously it really only > > > had to be called when you actually wanted to abort an ongoing iteration > > > over its yielded references. This is not the case anymore, and now you > > > have to call it unconditionally after you're done with the iterator. So > > > while the naming previously made sense, now it doesn't anymore. > > > > > > > Good point, I didn't realise this part. Thanks for the detailed > > explanation. I will continue to review the later patches. However, I > > won't touch the oid part, because I am not familiar with this. By the > > way, I think we miss out one thing in this patch: > > > > We forget to free the dir iterator defined in the > > "files-backend.c::files_fsck_refs_dir". I have just remembered that I > > use dir iterator when checking the ref consistency. > > Hm, good point. Why doesn't CI complain about this leak...? I'll > investigate, thanks for the hint! Wait, no, I had been looking at the wrong branch. We do free the iterator: diff --git a/refs/files-backend.c b/refs/files-backend.c index 11a620ea11a..859f1c11941 100644 --- a/refs/files-backend.c +++ b/refs/files-backend.c @@ -3837,6 +3820,7 @@ static int files_fsck_refs_dir(struct ref_store *ref_store, ret = error(_("failed to iterate over '%s'"), sb.buf); out: + dir_iterator_free(iter); strbuf_release(&sb); strbuf_release(&refname); return ret; Patrick
On Wed, Feb 19, 2025 at 02:20:15PM +0100, Patrick Steinhardt wrote: > On Wed, Feb 19, 2025 at 02:17:07PM +0100, Patrick Steinhardt wrote: > > On Wed, Feb 19, 2025 at 09:06:58PM +0800, shejialuo wrote: > > > On Wed, Feb 19, 2025 at 01:59:13PM +0100, Patrick Steinhardt wrote: > > > > Regarding the question why to even rename `ref_iterator_abort()` itself: > > > > this is done to avoid confusion going forward. Previously it really only > > > > had to be called when you actually wanted to abort an ongoing iteration > > > > over its yielded references. This is not the case anymore, and now you > > > > have to call it unconditionally after you're done with the iterator. So > > > > while the naming previously made sense, now it doesn't anymore. > > > > > > > > > > Good point, I didn't realise this part. Thanks for the detailed > > > explanation. I will continue to review the later patches. However, I > > > won't touch the oid part, because I am not familiar with this. By the > > > way, I think we miss out one thing in this patch: > > > > > > We forget to free the dir iterator defined in the > > > "files-backend.c::files_fsck_refs_dir". I have just remembered that I > > > use dir iterator when checking the ref consistency. > > > > Hm, good point. Why doesn't CI complain about this leak...? I'll > > investigate, thanks for the hint! > > Wait, no, I had been looking at the wrong branch. We do free the > iterator: > > diff --git a/refs/files-backend.c b/refs/files-backend.c > index 11a620ea11a..859f1c11941 100644 > --- a/refs/files-backend.c > +++ b/refs/files-backend.c > @@ -3837,6 +3820,7 @@ static int files_fsck_refs_dir(struct ref_store *ref_store, > ret = error(_("failed to iterate over '%s'"), sb.buf); > > out: > + dir_iterator_free(iter); > strbuf_release(&sb); > strbuf_release(&refname); > return ret; > > Patrick Oh, my mistake. I omit that part during review... Sorry here.
diff --git a/builtin/clone.c b/builtin/clone.c index fd001d800c6..ac3e84b2b18 100644 --- a/builtin/clone.c +++ b/builtin/clone.c @@ -426,6 +426,8 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest, strbuf_setlen(src, src_len); die(_("failed to iterate over '%s'"), src->buf); } + + dir_iterator_free(iter); } static void clone_local(const char *src_repo, const char *dest_repo) diff --git a/dir-iterator.c b/dir-iterator.c index de619846f29..857e1d9bdaf 100644 --- a/dir-iterator.c +++ b/dir-iterator.c @@ -193,9 +193,9 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator) if (S_ISDIR(iter->base.st.st_mode) && push_level(iter)) { if (errno != ENOENT && iter->flags & DIR_ITERATOR_PEDANTIC) - goto error_out; + return ITER_ERROR; if (iter->levels_nr == 0) - goto error_out; + return ITER_ERROR; } /* Loop until we find an entry that we can give back to the caller. */ @@ -211,11 +211,11 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator) int ret = next_directory_entry(level->dir, iter->base.path.buf, &de); if (ret < 0) { if (iter->flags & DIR_ITERATOR_PEDANTIC) - goto error_out; + return ITER_ERROR; continue; } else if (ret > 0) { if (pop_level(iter) == 0) - return dir_iterator_abort(dir_iterator); + return ITER_DONE; continue; } @@ -223,7 +223,7 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator) } else { if (level->entries_idx >= level->entries.nr) { if (pop_level(iter) == 0) - return dir_iterator_abort(dir_iterator); + return ITER_DONE; continue; } @@ -232,22 +232,21 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator) if (prepare_next_entry_data(iter, name)) { if (errno != ENOENT && iter->flags & DIR_ITERATOR_PEDANTIC) - goto error_out; + return ITER_ERROR; continue; } return ITER_OK; } - -error_out: - dir_iterator_abort(dir_iterator); - return ITER_ERROR; } -int dir_iterator_abort(struct dir_iterator *dir_iterator) +void dir_iterator_free(struct dir_iterator *dir_iterator) { struct dir_iterator_int *iter = (struct dir_iterator_int *)dir_iterator; + if (!iter) + return; + for (; iter->levels_nr; iter->levels_nr--) { struct dir_iterator_level *level = &iter->levels[iter->levels_nr - 1]; @@ -266,7 +265,6 @@ int dir_iterator_abort(struct dir_iterator *dir_iterator) free(iter->levels); strbuf_release(&iter->base.path); free(iter); - return ITER_DONE; } struct dir_iterator *dir_iterator_begin(const char *path, unsigned int flags) @@ -301,7 +299,7 @@ struct dir_iterator *dir_iterator_begin(const char *path, unsigned int flags) return dir_iterator; error_out: - dir_iterator_abort(dir_iterator); + dir_iterator_free(dir_iterator); errno = saved_errno; return NULL; } diff --git a/dir-iterator.h b/dir-iterator.h index 6d438809b6e..01f51f6bac1 100644 --- a/dir-iterator.h +++ b/dir-iterator.h @@ -27,10 +27,8 @@ * goto error_handler; * * while ((ok = dir_iterator_advance(iter)) == ITER_OK) { - * if (want_to_stop_iteration()) { - * ok = dir_iterator_abort(iter); + * if (want_to_stop_iteration()) * break; - * } * * // Access information about the current path: * if (S_ISDIR(iter->st.st_mode)) @@ -39,6 +37,7 @@ * * if (ok != ITER_DONE) * handle_error(); + * dir_iterator_release(iter); * * Callers are allowed to modify iter->path while they are working, * but they must restore it to its original contents before calling @@ -107,11 +106,7 @@ struct dir_iterator *dir_iterator_begin(const char *path, unsigned int flags); */ int dir_iterator_advance(struct dir_iterator *iterator); -/* - * End the iteration before it has been exhausted. Free the - * dir_iterator and any associated resources and return ITER_DONE. On - * error, free the dir_iterator and return ITER_ERROR. - */ -int dir_iterator_abort(struct dir_iterator *iterator); +/* Free the dir_iterator and any associated resources. */ +void dir_iterator_free(struct dir_iterator *iterator); #endif diff --git a/refs.c b/refs.c index eaf41421f50..8eff60a2186 100644 --- a/refs.c +++ b/refs.c @@ -2476,6 +2476,7 @@ int refs_verify_refnames_available(struct ref_store *refs, { struct strbuf dirname = STRBUF_INIT; struct strbuf referent = STRBUF_INIT; + struct ref_iterator *iter = NULL; struct strset dirnames; int ret = -1; @@ -2552,7 +2553,6 @@ int refs_verify_refnames_available(struct ref_store *refs, strbuf_addch(&dirname, '/'); if (!initial_transaction) { - struct ref_iterator *iter; int ok; iter = refs_ref_iterator_begin(refs, dirname.buf, NULL, 0, @@ -2564,12 +2564,14 @@ int refs_verify_refnames_available(struct ref_store *refs, strbuf_addf(err, _("'%s' exists; cannot create '%s'"), iter->refname, refname); - ref_iterator_abort(iter); goto cleanup; } if (ok != ITER_DONE) BUG("error while iterating over references"); + + ref_iterator_free(iter); + iter = NULL; } extra_refname = find_descendant_ref(dirname.buf, extras, skip); @@ -2586,6 +2588,7 @@ int refs_verify_refnames_available(struct ref_store *refs, strbuf_release(&referent); strbuf_release(&dirname); strset_clear(&dirnames); + ref_iterator_free(iter); return ret; } diff --git a/refs/debug.c b/refs/debug.c index fbc4df08b43..a9786da4ba1 100644 --- a/refs/debug.c +++ b/refs/debug.c @@ -179,19 +179,18 @@ static int debug_ref_iterator_peel(struct ref_iterator *ref_iterator, return res; } -static int debug_ref_iterator_abort(struct ref_iterator *ref_iterator) +static void debug_ref_iterator_release(struct ref_iterator *ref_iterator) { struct debug_ref_iterator *diter = (struct debug_ref_iterator *)ref_iterator; - int res = diter->iter->vtable->abort(diter->iter); - trace_printf_key(&trace_refs, "iterator_abort: %d\n", res); - return res; + diter->iter->vtable->release(diter->iter); + trace_printf_key(&trace_refs, "iterator_abort\n"); } static struct ref_iterator_vtable debug_ref_iterator_vtable = { .advance = debug_ref_iterator_advance, .peel = debug_ref_iterator_peel, - .abort = debug_ref_iterator_abort, + .release = debug_ref_iterator_release, }; static struct ref_iterator * diff --git a/refs/files-backend.c b/refs/files-backend.c index 29f08dced40..9511b6f3448 100644 --- a/refs/files-backend.c +++ b/refs/files-backend.c @@ -919,10 +919,6 @@ static int files_ref_iterator_advance(struct ref_iterator *ref_iterator) return ITER_OK; } - iter->iter0 = NULL; - if (ref_iterator_abort(ref_iterator) != ITER_DONE) - ok = ITER_ERROR; - return ok; } @@ -935,23 +931,17 @@ static int files_ref_iterator_peel(struct ref_iterator *ref_iterator, return ref_iterator_peel(iter->iter0, peeled); } -static int files_ref_iterator_abort(struct ref_iterator *ref_iterator) +static void files_ref_iterator_release(struct ref_iterator *ref_iterator) { struct files_ref_iterator *iter = (struct files_ref_iterator *)ref_iterator; - int ok = ITER_DONE; - - if (iter->iter0) - ok = ref_iterator_abort(iter->iter0); - - base_ref_iterator_free(ref_iterator); - return ok; + ref_iterator_free(iter->iter0); } static struct ref_iterator_vtable files_ref_iterator_vtable = { .advance = files_ref_iterator_advance, .peel = files_ref_iterator_peel, - .abort = files_ref_iterator_abort, + .release = files_ref_iterator_release, }; static struct ref_iterator *files_ref_iterator_begin( @@ -1382,7 +1372,7 @@ static int should_pack_refs(struct files_ref_store *refs, iter->flags, opts)) refcount++; if (refcount >= limit) { - ref_iterator_abort(iter); + ref_iterator_free(iter); return 1; } } @@ -1390,6 +1380,7 @@ static int should_pack_refs(struct files_ref_store *refs, if (ret != ITER_DONE) die("error while iterating over references"); + ref_iterator_free(iter); return 0; } @@ -1456,6 +1447,7 @@ static int files_pack_refs(struct ref_store *ref_store, packed_refs_unlock(refs->packed_ref_store); prune_refs(refs, &refs_to_prune); + ref_iterator_free(iter); strbuf_release(&err); return 0; } @@ -2303,9 +2295,6 @@ static int files_reflog_iterator_advance(struct ref_iterator *ref_iterator) return ITER_OK; } - iter->dir_iterator = NULL; - if (ref_iterator_abort(ref_iterator) == ITER_ERROR) - ok = ITER_ERROR; return ok; } @@ -2315,23 +2304,17 @@ static int files_reflog_iterator_peel(struct ref_iterator *ref_iterator UNUSED, BUG("ref_iterator_peel() called for reflog_iterator"); } -static int files_reflog_iterator_abort(struct ref_iterator *ref_iterator) +static void files_reflog_iterator_release(struct ref_iterator *ref_iterator) { struct files_reflog_iterator *iter = (struct files_reflog_iterator *)ref_iterator; - int ok = ITER_DONE; - - if (iter->dir_iterator) - ok = dir_iterator_abort(iter->dir_iterator); - - base_ref_iterator_free(ref_iterator); - return ok; + dir_iterator_free(iter->dir_iterator); } static struct ref_iterator_vtable files_reflog_iterator_vtable = { .advance = files_reflog_iterator_advance, .peel = files_reflog_iterator_peel, - .abort = files_reflog_iterator_abort, + .release = files_reflog_iterator_release, }; static struct ref_iterator *reflog_iterator_begin(struct ref_store *ref_store, @@ -3808,6 +3791,7 @@ static int files_fsck_refs_dir(struct ref_store *ref_store, ret = error(_("failed to iterate over '%s'"), sb.buf); out: + dir_iterator_free(iter); strbuf_release(&sb); strbuf_release(&refname); return ret; diff --git a/refs/iterator.c b/refs/iterator.c index d25e568bf0b..aaeff270437 100644 --- a/refs/iterator.c +++ b/refs/iterator.c @@ -21,9 +21,14 @@ int ref_iterator_peel(struct ref_iterator *ref_iterator, return ref_iterator->vtable->peel(ref_iterator, peeled); } -int ref_iterator_abort(struct ref_iterator *ref_iterator) +void ref_iterator_free(struct ref_iterator *ref_iterator) { - return ref_iterator->vtable->abort(ref_iterator); + if (ref_iterator) { + ref_iterator->vtable->release(ref_iterator); + /* Help make use-after-free bugs fail quickly: */ + ref_iterator->vtable = NULL; + free(ref_iterator); + } } void base_ref_iterator_init(struct ref_iterator *iter, @@ -36,20 +41,13 @@ void base_ref_iterator_init(struct ref_iterator *iter, iter->flags = 0; } -void base_ref_iterator_free(struct ref_iterator *iter) -{ - /* Help make use-after-free bugs fail quickly: */ - iter->vtable = NULL; - free(iter); -} - struct empty_ref_iterator { struct ref_iterator base; }; -static int empty_ref_iterator_advance(struct ref_iterator *ref_iterator) +static int empty_ref_iterator_advance(struct ref_iterator *ref_iterator UNUSED) { - return ref_iterator_abort(ref_iterator); + return ITER_DONE; } static int empty_ref_iterator_peel(struct ref_iterator *ref_iterator UNUSED, @@ -58,16 +56,14 @@ static int empty_ref_iterator_peel(struct ref_iterator *ref_iterator UNUSED, BUG("peel called for empty iterator"); } -static int empty_ref_iterator_abort(struct ref_iterator *ref_iterator) +static void empty_ref_iterator_release(struct ref_iterator *ref_iterator UNUSED) { - base_ref_iterator_free(ref_iterator); - return ITER_DONE; } static struct ref_iterator_vtable empty_ref_iterator_vtable = { .advance = empty_ref_iterator_advance, .peel = empty_ref_iterator_peel, - .abort = empty_ref_iterator_abort, + .release = empty_ref_iterator_release, }; struct ref_iterator *empty_ref_iterator_begin(void) @@ -151,11 +147,13 @@ static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator) if (!iter->current) { /* Initialize: advance both iterators to their first entries */ if ((ok = ref_iterator_advance(iter->iter0)) != ITER_OK) { + ref_iterator_free(iter->iter0); iter->iter0 = NULL; if (ok == ITER_ERROR) goto error; } if ((ok = ref_iterator_advance(iter->iter1)) != ITER_OK) { + ref_iterator_free(iter->iter1); iter->iter1 = NULL; if (ok == ITER_ERROR) goto error; @@ -166,6 +164,7 @@ static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator) * entry: */ if ((ok = ref_iterator_advance(*iter->current)) != ITER_OK) { + ref_iterator_free(*iter->current); *iter->current = NULL; if (ok == ITER_ERROR) goto error; @@ -179,9 +178,8 @@ static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator) iter->select(iter->iter0, iter->iter1, iter->cb_data); if (selection == ITER_SELECT_DONE) { - return ref_iterator_abort(ref_iterator); + return ITER_DONE; } else if (selection == ITER_SELECT_ERROR) { - ref_iterator_abort(ref_iterator); return ITER_ERROR; } @@ -195,6 +193,7 @@ static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator) if (selection & ITER_SKIP_SECONDARY) { if ((ok = ref_iterator_advance(*secondary)) != ITER_OK) { + ref_iterator_free(*secondary); *secondary = NULL; if (ok == ITER_ERROR) goto error; @@ -211,7 +210,6 @@ static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator) } error: - ref_iterator_abort(ref_iterator); return ITER_ERROR; } @@ -227,28 +225,18 @@ static int merge_ref_iterator_peel(struct ref_iterator *ref_iterator, return ref_iterator_peel(*iter->current, peeled); } -static int merge_ref_iterator_abort(struct ref_iterator *ref_iterator) +static void merge_ref_iterator_release(struct ref_iterator *ref_iterator) { struct merge_ref_iterator *iter = (struct merge_ref_iterator *)ref_iterator; - int ok = ITER_DONE; - - if (iter->iter0) { - if (ref_iterator_abort(iter->iter0) != ITER_DONE) - ok = ITER_ERROR; - } - if (iter->iter1) { - if (ref_iterator_abort(iter->iter1) != ITER_DONE) - ok = ITER_ERROR; - } - base_ref_iterator_free(ref_iterator); - return ok; + ref_iterator_free(iter->iter0); + ref_iterator_free(iter->iter1); } static struct ref_iterator_vtable merge_ref_iterator_vtable = { .advance = merge_ref_iterator_advance, .peel = merge_ref_iterator_peel, - .abort = merge_ref_iterator_abort, + .release = merge_ref_iterator_release, }; struct ref_iterator *merge_ref_iterator_begin( @@ -310,10 +298,10 @@ struct ref_iterator *overlay_ref_iterator_begin( * them. */ if (is_empty_ref_iterator(front)) { - ref_iterator_abort(front); + ref_iterator_free(front); return back; } else if (is_empty_ref_iterator(back)) { - ref_iterator_abort(back); + ref_iterator_free(back); return front; } @@ -350,19 +338,10 @@ static int prefix_ref_iterator_advance(struct ref_iterator *ref_iterator) while ((ok = ref_iterator_advance(iter->iter0)) == ITER_OK) { int cmp = compare_prefix(iter->iter0->refname, iter->prefix); - if (cmp < 0) continue; - - if (cmp > 0) { - /* - * As the source iterator is ordered, we - * can stop the iteration as soon as we see a - * refname that comes after the prefix: - */ - ok = ref_iterator_abort(iter->iter0); - break; - } + if (cmp > 0) + return ITER_DONE; if (iter->trim) { /* @@ -386,9 +365,6 @@ static int prefix_ref_iterator_advance(struct ref_iterator *ref_iterator) return ITER_OK; } - iter->iter0 = NULL; - if (ref_iterator_abort(ref_iterator) != ITER_DONE) - return ITER_ERROR; return ok; } @@ -401,23 +377,18 @@ static int prefix_ref_iterator_peel(struct ref_iterator *ref_iterator, return ref_iterator_peel(iter->iter0, peeled); } -static int prefix_ref_iterator_abort(struct ref_iterator *ref_iterator) +static void prefix_ref_iterator_release(struct ref_iterator *ref_iterator) { struct prefix_ref_iterator *iter = (struct prefix_ref_iterator *)ref_iterator; - int ok = ITER_DONE; - - if (iter->iter0) - ok = ref_iterator_abort(iter->iter0); + ref_iterator_free(iter->iter0); free(iter->prefix); - base_ref_iterator_free(ref_iterator); - return ok; } static struct ref_iterator_vtable prefix_ref_iterator_vtable = { .advance = prefix_ref_iterator_advance, .peel = prefix_ref_iterator_peel, - .abort = prefix_ref_iterator_abort, + .release = prefix_ref_iterator_release, }; struct ref_iterator *prefix_ref_iterator_begin(struct ref_iterator *iter0, @@ -453,20 +424,14 @@ int do_for_each_ref_iterator(struct ref_iterator *iter, current_ref_iter = iter; while ((ok = ref_iterator_advance(iter)) == ITER_OK) { retval = fn(iter->refname, iter->referent, iter->oid, iter->flags, cb_data); - if (retval) { - /* - * If ref_iterator_abort() returns ITER_ERROR, - * we ignore that error in deference to the - * callback function's return value. - */ - ref_iterator_abort(iter); + if (retval) goto out; - } } out: current_ref_iter = old_ref_iter; if (ok == ITER_ERROR) - return -1; + retval = -1; + ref_iterator_free(iter); return retval; } diff --git a/refs/packed-backend.c b/refs/packed-backend.c index a7b6f74b6e3..38a1956d1a8 100644 --- a/refs/packed-backend.c +++ b/refs/packed-backend.c @@ -954,9 +954,6 @@ static int packed_ref_iterator_advance(struct ref_iterator *ref_iterator) return ITER_OK; } - if (ref_iterator_abort(ref_iterator) != ITER_DONE) - ok = ITER_ERROR; - return ok; } @@ -976,23 +973,19 @@ static int packed_ref_iterator_peel(struct ref_iterator *ref_iterator, } } -static int packed_ref_iterator_abort(struct ref_iterator *ref_iterator) +static void packed_ref_iterator_release(struct ref_iterator *ref_iterator) { struct packed_ref_iterator *iter = (struct packed_ref_iterator *)ref_iterator; - int ok = ITER_DONE; - strbuf_release(&iter->refname_buf); free(iter->jump); release_snapshot(iter->snapshot); - base_ref_iterator_free(ref_iterator); - return ok; } static struct ref_iterator_vtable packed_ref_iterator_vtable = { .advance = packed_ref_iterator_advance, .peel = packed_ref_iterator_peel, - .abort = packed_ref_iterator_abort + .release = packed_ref_iterator_release, }; static int jump_list_entry_cmp(const void *va, const void *vb) @@ -1362,8 +1355,10 @@ static int write_with_updates(struct packed_ref_store *refs, */ iter = packed_ref_iterator_begin(&refs->base, "", NULL, DO_FOR_EACH_INCLUDE_BROKEN); - if ((ok = ref_iterator_advance(iter)) != ITER_OK) + if ((ok = ref_iterator_advance(iter)) != ITER_OK) { + ref_iterator_free(iter); iter = NULL; + } i = 0; @@ -1411,8 +1406,10 @@ static int write_with_updates(struct packed_ref_store *refs, * the iterator over the unneeded * value. */ - if ((ok = ref_iterator_advance(iter)) != ITER_OK) + if ((ok = ref_iterator_advance(iter)) != ITER_OK) { + ref_iterator_free(iter); iter = NULL; + } cmp = +1; } else { /* @@ -1449,8 +1446,10 @@ static int write_with_updates(struct packed_ref_store *refs, peel_error ? NULL : &peeled)) goto write_error; - if ((ok = ref_iterator_advance(iter)) != ITER_OK) + if ((ok = ref_iterator_advance(iter)) != ITER_OK) { + ref_iterator_free(iter); iter = NULL; + } } else if (is_null_oid(&update->new_oid)) { /* * The update wants to delete the reference, @@ -1499,9 +1498,7 @@ static int write_with_updates(struct packed_ref_store *refs, get_tempfile_path(refs->tempfile), strerror(errno)); error: - if (iter) - ref_iterator_abort(iter); - + ref_iterator_free(iter); delete_tempfile(&refs->tempfile); return -1; } diff --git a/refs/ref-cache.c b/refs/ref-cache.c index 02f09e4df88..6457e02c1ea 100644 --- a/refs/ref-cache.c +++ b/refs/ref-cache.c @@ -409,7 +409,7 @@ static int cache_ref_iterator_advance(struct ref_iterator *ref_iterator) if (++level->index == level->dir->nr) { /* This level is exhausted; pop up a level */ if (--iter->levels_nr == 0) - return ref_iterator_abort(ref_iterator); + return ITER_DONE; continue; } @@ -452,21 +452,18 @@ static int cache_ref_iterator_peel(struct ref_iterator *ref_iterator, return peel_object(iter->repo, ref_iterator->oid, peeled) ? -1 : 0; } -static int cache_ref_iterator_abort(struct ref_iterator *ref_iterator) +static void cache_ref_iterator_release(struct ref_iterator *ref_iterator) { struct cache_ref_iterator *iter = (struct cache_ref_iterator *)ref_iterator; - free((char *)iter->prefix); free(iter->levels); - base_ref_iterator_free(ref_iterator); - return ITER_DONE; } static struct ref_iterator_vtable cache_ref_iterator_vtable = { .advance = cache_ref_iterator_advance, .peel = cache_ref_iterator_peel, - .abort = cache_ref_iterator_abort + .release = cache_ref_iterator_release, }; struct ref_iterator *cache_ref_iterator_begin(struct ref_cache *cache, diff --git a/refs/refs-internal.h b/refs/refs-internal.h index aaab711bb96..27ff822cf43 100644 --- a/refs/refs-internal.h +++ b/refs/refs-internal.h @@ -273,11 +273,11 @@ enum do_for_each_ref_flags { * the next reference and returns ITER_OK. The data pointed at by * refname and oid belong to the iterator; if you want to retain them * after calling ref_iterator_advance() again or calling - * ref_iterator_abort(), you must make a copy. When the iteration has + * ref_iterator_free(), you must make a copy. When the iteration has * been exhausted, ref_iterator_advance() releases any resources * associated with the iteration, frees the ref_iterator object, and * returns ITER_DONE. If you want to abort the iteration early, call - * ref_iterator_abort(), which also frees the ref_iterator object and + * ref_iterator_free(), which also frees the ref_iterator object and * any associated resources. If there was an internal error advancing * to the next entry, ref_iterator_advance() aborts the iteration, * frees the ref_iterator, and returns ITER_ERROR. @@ -292,10 +292,8 @@ enum do_for_each_ref_flags { * struct ref_iterator *iter = ...; * * while ((ok = ref_iterator_advance(iter)) == ITER_OK) { - * if (want_to_stop_iteration()) { - * ok = ref_iterator_abort(iter); + * if (want_to_stop_iteration()) * break; - * } * * // Access information about the current reference: * if (!(iter->flags & REF_ISSYMREF)) @@ -307,6 +305,7 @@ enum do_for_each_ref_flags { * * if (ok != ITER_DONE) * handle_error(); + * ref_iterator_free(iter); */ struct ref_iterator { struct ref_iterator_vtable *vtable; @@ -333,12 +332,8 @@ int ref_iterator_advance(struct ref_iterator *ref_iterator); int ref_iterator_peel(struct ref_iterator *ref_iterator, struct object_id *peeled); -/* - * End the iteration before it has been exhausted, freeing the - * reference iterator and any associated resources and returning - * ITER_DONE. If the abort itself failed, return ITER_ERROR. - */ -int ref_iterator_abort(struct ref_iterator *ref_iterator); +/* Free the reference iterator and any associated resources. */ +void ref_iterator_free(struct ref_iterator *ref_iterator); /* * An iterator over nothing (its first ref_iterator_advance() call @@ -438,13 +433,6 @@ struct ref_iterator *prefix_ref_iterator_begin(struct ref_iterator *iter0, void base_ref_iterator_init(struct ref_iterator *iter, struct ref_iterator_vtable *vtable); -/* - * Base class destructor for ref_iterators. Destroy the ref_iterator - * part of iter and shallow-free the object. This is meant to be - * called only by the destructors of derived classes. - */ -void base_ref_iterator_free(struct ref_iterator *iter); - /* Virtual function declarations for ref_iterators: */ /* @@ -463,15 +451,14 @@ typedef int ref_iterator_peel_fn(struct ref_iterator *ref_iterator, /* * Implementations of this function should free any resources specific - * to the derived class, then call base_ref_iterator_free() to clean - * up and free the ref_iterator object. + * to the derived class. */ -typedef int ref_iterator_abort_fn(struct ref_iterator *ref_iterator); +typedef void ref_iterator_release_fn(struct ref_iterator *ref_iterator); struct ref_iterator_vtable { ref_iterator_advance_fn *advance; ref_iterator_peel_fn *peel; - ref_iterator_abort_fn *abort; + ref_iterator_release_fn *release; }; /* diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c index 2a90e7cb391..06543f79c64 100644 --- a/refs/reftable-backend.c +++ b/refs/reftable-backend.c @@ -711,17 +711,10 @@ static int reftable_ref_iterator_advance(struct ref_iterator *ref_iterator) break; } - if (iter->err > 0) { - if (ref_iterator_abort(ref_iterator) != ITER_DONE) - return ITER_ERROR; + if (iter->err > 0) return ITER_DONE; - } - - if (iter->err < 0) { - ref_iterator_abort(ref_iterator); + if (iter->err < 0) return ITER_ERROR; - } - return ITER_OK; } @@ -740,7 +733,7 @@ static int reftable_ref_iterator_peel(struct ref_iterator *ref_iterator, return -1; } -static int reftable_ref_iterator_abort(struct ref_iterator *ref_iterator) +static void reftable_ref_iterator_release(struct ref_iterator *ref_iterator) { struct reftable_ref_iterator *iter = (struct reftable_ref_iterator *)ref_iterator; @@ -751,14 +744,12 @@ static int reftable_ref_iterator_abort(struct ref_iterator *ref_iterator) free(iter->exclude_patterns[i]); free(iter->exclude_patterns); } - free(iter); - return ITER_DONE; } static struct ref_iterator_vtable reftable_ref_iterator_vtable = { .advance = reftable_ref_iterator_advance, .peel = reftable_ref_iterator_peel, - .abort = reftable_ref_iterator_abort + .release = reftable_ref_iterator_release, }; static int qsort_strcmp(const void *va, const void *vb) @@ -2017,17 +2008,10 @@ static int reftable_reflog_iterator_advance(struct ref_iterator *ref_iterator) break; } - if (iter->err > 0) { - if (ref_iterator_abort(ref_iterator) != ITER_DONE) - return ITER_ERROR; + if (iter->err > 0) return ITER_DONE; - } - - if (iter->err < 0) { - ref_iterator_abort(ref_iterator); + if (iter->err < 0) return ITER_ERROR; - } - return ITER_OK; } @@ -2038,21 +2022,19 @@ static int reftable_reflog_iterator_peel(struct ref_iterator *ref_iterator UNUSE return -1; } -static int reftable_reflog_iterator_abort(struct ref_iterator *ref_iterator) +static void reftable_reflog_iterator_release(struct ref_iterator *ref_iterator) { struct reftable_reflog_iterator *iter = (struct reftable_reflog_iterator *)ref_iterator; reftable_log_record_release(&iter->log); reftable_iterator_destroy(&iter->iter); strbuf_release(&iter->last_name); - free(iter); - return ITER_DONE; } static struct ref_iterator_vtable reftable_reflog_iterator_vtable = { .advance = reftable_reflog_iterator_advance, .peel = reftable_reflog_iterator_peel, - .abort = reftable_reflog_iterator_abort + .release = reftable_reflog_iterator_release, }; static struct reftable_reflog_iterator *reflog_iterator_for_stack(struct reftable_ref_store *refs, diff --git a/t/helper/test-dir-iterator.c b/t/helper/test-dir-iterator.c index 6b297bd7536..8d46e8ba409 100644 --- a/t/helper/test-dir-iterator.c +++ b/t/helper/test-dir-iterator.c @@ -53,6 +53,7 @@ int cmd__dir_iterator(int argc, const char **argv) printf("(%s) [%s] %s\n", diter->relative_path, diter->basename, diter->path.buf); } + dir_iterator_free(diter); if (iter_status != ITER_DONE) { printf("dir_iterator_advance failure\n");
The ref and reflog iterators have their lifecycle attached to iteration: once the iterator reaches its end, it is automatically released and the caller doesn't have to care about that anymore. When the iterator should be released before it has been exhausted, callers must explicitly abort the iterator via `ref_iterator_abort()`. This lifecycle is somewhat unusual in the Git codebase and creates two problems: - Callsites need to be very careful about when exactly they call `ref_iterator_abort()`, as calling the function is only valid when the iterator itself still is. This leads to somewhat awkward calling patterns in some situations. - It is impossible to reuse iterators and re-seek them to a different prefix. This feature isn't supported by any iterator implementation except for the reftable iterators anyway, but if it was implemented it would allow us to optimize cases where we need to search for specific references repeatedly by reusing internal state. Detangle the lifecycle from iteration so that we don't deallocate the iterator anymore once it is exhausted. Instead, callers are now expected to always call a newly introduce `ref_iterator_free()` function that deallocates the iterator and its internal state. While at it, drop the return value of `ref_iterator_abort()`, which wasn't really required by any of the iterator implementations anyway. Furthermore, stop calling `base_ref_iterator_free()` in any of the backends, but instead call it in `ref_iterator_free()`. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- builtin/clone.c | 2 + dir-iterator.c | 24 +++++------ dir-iterator.h | 13 ++---- refs.c | 7 +++- refs/debug.c | 9 ++--- refs/files-backend.c | 36 +++++------------ refs/iterator.c | 95 ++++++++++++++------------------------------ refs/packed-backend.c | 27 ++++++------- refs/ref-cache.c | 9 ++--- refs/refs-internal.h | 31 +++++---------- refs/reftable-backend.c | 34 ++++------------ t/helper/test-dir-iterator.c | 1 + 12 files changed, 99 insertions(+), 189 deletions(-)