diff mbox series

[07/14] refs/iterator: separate lifecycle from iteration

Message ID 20250217-pks-update-ref-optimization-v1-7-a2b6d87a24af@pks.im (mailing list archive)
State Superseded
Headers show
Series refs: batch refname availability checks | expand

Commit Message

Patrick Steinhardt Feb. 17, 2025, 3:50 p.m. UTC
The ref and reflog iterators have their lifecycle attached to iteration:
once the iterator reaches its end, it is automatically released and the
caller doesn't have to care about that anymore. When the iterator should
be released before it has been exhausted, callers must explicitly abort
the iterator via `ref_iterator_abort()`.

This lifecycle is somewhat unusual in the Git codebase and creates two
problems:

  - Callsites need to be very careful about when exactly they call
    `ref_iterator_abort()`, as calling the function is only valid when
    the iterator itself still is. This leads to somewhat awkward calling
    patterns in some situations.

  - It is impossible to reuse iterators and re-seek them to a different
    prefix. This feature isn't supported by any iterator implementation
    except for the reftable iterators anyway, but if it was implemented
    it would allow us to optimize cases where we need to search for
    specific references repeatedly by reusing internal state.

Detangle the lifecycle from iteration so that we don't deallocate the
iterator anymore once it is exhausted. Instead, callers are now expected
to always call a newly introduce `ref_iterator_free()` function that
deallocates the iterator and its internal state.

While at it, drop the return value of `ref_iterator_abort()`, which
wasn't really required by any of the iterator implementations anyway.
Furthermore, stop calling `base_ref_iterator_free()` in any of the
backends, but instead call it in `ref_iterator_free()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/clone.c              |  2 +
 dir-iterator.c               | 24 +++++------
 dir-iterator.h               | 13 ++----
 refs.c                       |  7 +++-
 refs/debug.c                 |  9 ++---
 refs/files-backend.c         | 36 +++++------------
 refs/iterator.c              | 95 ++++++++++++++------------------------------
 refs/packed-backend.c        | 27 ++++++-------
 refs/ref-cache.c             |  9 ++---
 refs/refs-internal.h         | 31 +++++----------
 refs/reftable-backend.c      | 34 ++++------------
 t/helper/test-dir-iterator.c |  1 +
 12 files changed, 99 insertions(+), 189 deletions(-)

Comments

shejialuo Feb. 18, 2025, 4:52 p.m. UTC | #1
On Mon, Feb 17, 2025 at 04:50:21PM +0100, Patrick Steinhardt wrote:
> The ref and reflog iterators have their lifecycle attached to iteration:
> once the iterator reaches its end, it is automatically released and the
> caller doesn't have to care about that anymore. When the iterator should
> be released before it has been exhausted, callers must explicitly abort
> the iterator via `ref_iterator_abort()`.
> 
> This lifecycle is somewhat unusual in the Git codebase and creates two
> problems:
> 
>   - Callsites need to be very careful about when exactly they call
>     `ref_iterator_abort()`, as calling the function is only valid when
>     the iterator itself still is. This leads to somewhat awkward calling
>     patterns in some situations.
> 

In what situations, the iterator has been disappeared when we call
`ref_iterator_abort`? Why is this awkward? When reading, I am really
curious about this.

>   - It is impossible to reuse iterators and re-seek them to a different
>     prefix. This feature isn't supported by any iterator implementation
>     except for the reftable iterators anyway, but if it was implemented
>     it would allow us to optimize cases where we need to search for
>     specific references repeatedly by reusing internal state.
> 

So, the reason why we cannot reuse the iterator is that we will
deallocate the iterator? So, below we want to detangle the lifecycle. I
don't know whether my understanding is correct.

> Detangle the lifecycle from iteration so that we don't deallocate the
> iterator anymore once it is exhausted. Instead, callers are now expected
> to always call a newly introduce `ref_iterator_free()` function that
> deallocates the iterator and its internal state.
> 

A design question: why do not just introduce a variable, for example,
`unsigned int free` to indicate whether we need to free the iterator?

> While at it, drop the return value of `ref_iterator_abort()`, which
> wasn't really required by any of the iterator implementations anyway.
> Furthermore, stop calling `base_ref_iterator_free()` in any of the
> backends, but instead call it in `ref_iterator_free()`.
> 
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  builtin/clone.c              |  2 +
>  dir-iterator.c               | 24 +++++------
>  dir-iterator.h               | 13 ++----
>  refs.c                       |  7 +++-
>  refs/debug.c                 |  9 ++---
>  refs/files-backend.c         | 36 +++++------------
>  refs/iterator.c              | 95 ++++++++++++++------------------------------
>  refs/packed-backend.c        | 27 ++++++-------
>  refs/ref-cache.c             |  9 ++---
>  refs/refs-internal.h         | 31 +++++----------
>  refs/reftable-backend.c      | 34 ++++------------
>  t/helper/test-dir-iterator.c |  1 +
>  12 files changed, 99 insertions(+), 189 deletions(-)
> 
> diff --git a/builtin/clone.c b/builtin/clone.c
> index fd001d800c6..ac3e84b2b18 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -426,6 +426,8 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
>  		strbuf_setlen(src, src_len);
>  		die(_("failed to iterate over '%s'"), src->buf);
>  	}
> +
> +	dir_iterator_free(iter);

Here, we explicitly free the iterator. This is a must, because we will
never free the iterator anymore.

>  }
>  
>  static void clone_local(const char *src_repo, const char *dest_repo)
> diff --git a/dir-iterator.c b/dir-iterator.c
> index de619846f29..857e1d9bdaf 100644
> --- a/dir-iterator.c
> +++ b/dir-iterator.c
> @@ -193,9 +193,9 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
>  
>  	if (S_ISDIR(iter->base.st.st_mode) && push_level(iter)) {
>  		if (errno != ENOENT && iter->flags & DIR_ITERATOR_PEDANTIC)
> -			goto error_out;
> +			return ITER_ERROR;

Here, we just return `ITER_ERROR` instead of going `error_out` label
which will call `dir_iterator_abort` function to free the iterator, we
should make the caller free.

>  		if (iter->levels_nr == 0)
> -			goto error_out;
> +			return ITER_ERROR;
>  	}
>  
>  	/* Loop until we find an entry that we can give back to the caller. */
> @@ -211,11 +211,11 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
>  			int ret = next_directory_entry(level->dir, iter->base.path.buf, &de);
>  			if (ret < 0) {
>  				if (iter->flags & DIR_ITERATOR_PEDANTIC)
> -					goto error_out;
> +					return ITER_ERROR;
>  				continue;
>  			} else if (ret > 0) {
>  				if (pop_level(iter) == 0)
> -					return dir_iterator_abort(dir_iterator);
> +					return ITER_DONE;

Instead of calling `dir_iterator_abort`, we just return `ITER_DONE`.
However, this does not make sense. We break the semantics of
`ITER_DONE`. Let me cite the comment from "iterator.h":

    /*
     * The iterator is exhausted and has been freed.
     */
    #define ITER_DONE -1

However, we don't free the iterator here.

>  				continue;
>  			}
>  
> @@ -223,7 +223,7 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
>  		} else {
>  			if (level->entries_idx >= level->entries.nr) {
>  				if (pop_level(iter) == 0)
> -					return dir_iterator_abort(dir_iterator);
> +					return ITER_DONE;
>  				continue;
>  			}
>  
> @@ -232,22 +232,21 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
>  
>  		if (prepare_next_entry_data(iter, name)) {
>  			if (errno != ENOENT && iter->flags & DIR_ITERATOR_PEDANTIC)
> -				goto error_out;
> +				return ITER_ERROR;
>  			continue;
>  		}
>  
>  		return ITER_OK;
>  	}
> -
> -error_out:
> -	dir_iterator_abort(dir_iterator);
> -	return ITER_ERROR;
>  }
>  
> -int dir_iterator_abort(struct dir_iterator *dir_iterator)
> +void dir_iterator_free(struct dir_iterator *dir_iterator)

We rename `dir_iterator_abort` to `dir_iterator_free`. And we drop the
return value.

>  {
>  	struct dir_iterator_int *iter = (struct dir_iterator_int *)dir_iterator;
>  
> +	if (!iter)
> +		return;
> +

Make sense, because we have no idea whether the `dir_iterator` is
exhausted. So we need to check whether it is valid.

>  	for (; iter->levels_nr; iter->levels_nr--) {
>  		struct dir_iterator_level *level =
>  			&iter->levels[iter->levels_nr - 1];
> @@ -266,7 +265,6 @@ int dir_iterator_abort(struct dir_iterator *dir_iterator)
>  	free(iter->levels);
>  	strbuf_release(&iter->base.path);
>  	free(iter);
> -	return ITER_DONE;

I am confused that why we don't provide `dir_iterator_release` for dir
iterator? For other ref related iterators, we will rename "*abort" to
"*release" and drop the operation to free the iterator. However, for dir
iterator, we rename its "*abort" to "*free".

I think this does not make sense. It causes inconsistency. Although dir
iterator does _not_ have "peel" method. But it does have "advance" and
"abort" methods just like ref iterators.

I think you have already considered this problem. I guess that's the
reason why in the below comment you typed `dir_iterator_release` instead
of `dir_iterator_free`.

>  }
>  
>  struct dir_iterator *dir_iterator_begin(const char *path, unsigned int flags)
> @@ -301,7 +299,7 @@ struct dir_iterator *dir_iterator_begin(const char *path, unsigned int flags)
>  	return dir_iterator;
>  
>  error_out:
> -	dir_iterator_abort(dir_iterator);
> +	dir_iterator_free(dir_iterator);
>  	errno = saved_errno;
>  	return NULL;
>  }
> diff --git a/dir-iterator.h b/dir-iterator.h
> index 6d438809b6e..01f51f6bac1 100644
> --- a/dir-iterator.h
> +++ b/dir-iterator.h
> @@ -27,10 +27,8 @@
>   *             goto error_handler;
>   *
>   *     while ((ok = dir_iterator_advance(iter)) == ITER_OK) {
> - *             if (want_to_stop_iteration()) {
> - *                     ok = dir_iterator_abort(iter);
> + *             if (want_to_stop_iteration())
>   *                     break;
> - *             }

Is this correct? If `want_to_stop_iteration()` is true, the `ok` will be
`ITER_OK` and then it breaks the loop to jump to check whether the `ok`
is `ITER_DONE`. Of course it is not, it will call `handle_error`. At
least, we should assign `ok = ITER_DONE`.

>   *
>   *             // Access information about the current path:
>   *             if (S_ISDIR(iter->st.st_mode))
> @@ -39,6 +37,7 @@
>   *
>   *     if (ok != ITER_DONE)
>   *             handle_error();
> + *     dir_iterator_release(iter);

I think this is a typo. Should `dir_iterator_release` be
`dir_iterator_free`?

>   *
>   * Callers are allowed to modify iter->path while they are working,
>   * but they must restore it to its original contents before calling
> @@ -107,11 +106,7 @@ struct dir_iterator *dir_iterator_begin(const char *path, unsigned int flags);
>   */
>  int dir_iterator_advance(struct dir_iterator *iterator);
>  
> -/*
> - * End the iteration before it has been exhausted. Free the
> - * dir_iterator and any associated resources and return ITER_DONE. On
> - * error, free the dir_iterator and return ITER_ERROR.
> - */
> -int dir_iterator_abort(struct dir_iterator *iterator);
> +/* Free the dir_iterator and any associated resources. */
> +void dir_iterator_free(struct dir_iterator *iterator);
>  
>  #endif
> diff --git a/refs.c b/refs.c
> index eaf41421f50..8eff60a2186 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -2476,6 +2476,7 @@ int refs_verify_refnames_available(struct ref_store *refs,
>  {
>  	struct strbuf dirname = STRBUF_INIT;
>  	struct strbuf referent = STRBUF_INIT;
> +	struct ref_iterator *iter = NULL;
>  	struct strset dirnames;
>  	int ret = -1;
>  
> @@ -2552,7 +2553,6 @@ int refs_verify_refnames_available(struct ref_store *refs,
>  		strbuf_addch(&dirname, '/');
>  
>  		if (!initial_transaction) {
> -			struct ref_iterator *iter;
>  			int ok;
>  
>  			iter = refs_ref_iterator_begin(refs, dirname.buf, NULL, 0,
> @@ -2564,12 +2564,14 @@ int refs_verify_refnames_available(struct ref_store *refs,
>  
>  				strbuf_addf(err, _("'%s' exists; cannot create '%s'"),
>  					    iter->refname, refname);
> -				ref_iterator_abort(iter);
>  				goto cleanup;
>  			}
>  
>  			if (ok != ITER_DONE)
>  				BUG("error while iterating over references");
> +
> +			ref_iterator_free(iter);
> +			iter = NULL;
>  		}
>  
>  		extra_refname = find_descendant_ref(dirname.buf, extras, skip);
> @@ -2586,6 +2588,7 @@ int refs_verify_refnames_available(struct ref_store *refs,
>  	strbuf_release(&referent);
>  	strbuf_release(&dirname);
>  	strset_clear(&dirnames);
> +	ref_iterator_free(iter);
>  	return ret;
>  }
>  
> diff --git a/refs/debug.c b/refs/debug.c
> index fbc4df08b43..a9786da4ba1 100644
> --- a/refs/debug.c
> +++ b/refs/debug.c
> @@ -179,19 +179,18 @@ static int debug_ref_iterator_peel(struct ref_iterator *ref_iterator,
>  	return res;
>  }
>  
> -static int debug_ref_iterator_abort(struct ref_iterator *ref_iterator)
> +static void debug_ref_iterator_release(struct ref_iterator *ref_iterator)

Here, we rename "*abort" to "*release" for debug ref and call the
corresponding `release` method.

>  {
>  	struct debug_ref_iterator *diter =
>  		(struct debug_ref_iterator *)ref_iterator;
> -	int res = diter->iter->vtable->abort(diter->iter);
> -	trace_printf_key(&trace_refs, "iterator_abort: %d\n", res);
> -	return res;
> +	diter->iter->vtable->release(diter->iter);
> +	trace_printf_key(&trace_refs, "iterator_abort\n");
>  }
>  
>  static struct ref_iterator_vtable debug_ref_iterator_vtable = {
>  	.advance = debug_ref_iterator_advance,
>  	.peel = debug_ref_iterator_peel,
> -	.abort = debug_ref_iterator_abort,
> +	.release = debug_ref_iterator_release,
>  };
>  
>  static struct ref_iterator *
> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index 29f08dced40..9511b6f3448 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -919,10 +919,6 @@ static int files_ref_iterator_advance(struct ref_iterator *ref_iterator)
>  		return ITER_OK;
>  	}
>  
> -	iter->iter0 = NULL;
> -	if (ref_iterator_abort(ref_iterator) != ITER_DONE)
> -		ok = ITER_ERROR;
> -

We don't abort the iterator in `advance`. Is this because we want to
reuse this iterator?

>  	return ok;
>  }
>  

[snip]

> @@ -935,23 +931,17 @@ static int files_ref_iterator_peel(struct ref_iterator *ref_iterator,
>  	return ref_iterator_peel(iter->iter0, peeled);
>  }
>  
> -int ref_iterator_abort(struct ref_iterator *ref_iterator)
> +void ref_iterator_free(struct ref_iterator *ref_iterator)
>  {
> -	return ref_iterator->vtable->abort(ref_iterator);
> +	if (ref_iterator) {
> +		ref_iterator->vtable->release(ref_iterator);
> +		/* Help make use-after-free bugs fail quickly: */
> +		ref_iterator->vtable = NULL;
> +		free(ref_iterator);
> +	}
>  }
>  

So, when calling `ref_iterator_free`, we will call corresponding
"release" method to release the resources associated and then we free
the iterator. Would this be too complicated? From my view, we could just
make the `ref_iterator_abort` name unchanged but add a new variable such
as "unsigned int free_iterator". And we change each "abort" callback to
avoid free the iterator. This would be much simpler.

[snip]

> diff --git a/refs/ref-cache.c b/refs/ref-cache.c
> index 02f09e4df88..6457e02c1ea 100644
> --- a/refs/ref-cache.c
> +++ b/refs/ref-cache.c
> @@ -409,7 +409,7 @@ static int cache_ref_iterator_advance(struct ref_iterator *ref_iterator)
>  		if (++level->index == level->dir->nr) {
>  			/* This level is exhausted; pop up a level */
>  			if (--iter->levels_nr == 0)
> -				return ref_iterator_abort(ref_iterator);
> +				return ITER_DONE;

As I have said, simply return `ITER_DONE` breaks the semantics of the
`ITER_DONE`.

>  
>  			continue;
>  		}
> @@ -452,21 +452,18 @@ static int cache_ref_iterator_peel(struct ref_iterator *ref_iterator,
>  	return peel_object(iter->repo, ref_iterator->oid, peeled) ? -1 : 0;
>  }
>  
> -static int cache_ref_iterator_abort(struct ref_iterator *ref_iterator)
> +static void cache_ref_iterator_release(struct ref_iterator *ref_iterator)
>  {
>  	struct cache_ref_iterator *iter =
>  		(struct cache_ref_iterator *)ref_iterator;
> -
>  	free((char *)iter->prefix);
>  	free(iter->levels);
> -	base_ref_iterator_free(ref_iterator);
> -	return ITER_DONE;
>  }
>  
>  static struct ref_iterator_vtable cache_ref_iterator_vtable = {
>  	.advance = cache_ref_iterator_advance,
>  	.peel = cache_ref_iterator_peel,
> -	.abort = cache_ref_iterator_abort
> +	.release = cache_ref_iterator_release,
>  };
>  
>  struct ref_iterator *cache_ref_iterator_begin(struct ref_cache *cache,
> diff --git a/refs/refs-internal.h b/refs/refs-internal.h
> index aaab711bb96..27ff822cf43 100644
> --- a/refs/refs-internal.h
> +++ b/refs/refs-internal.h
> @@ -273,11 +273,11 @@ enum do_for_each_ref_flags {
>   * the next reference and returns ITER_OK. The data pointed at by
>   * refname and oid belong to the iterator; if you want to retain them
>   * after calling ref_iterator_advance() again or calling
> - * ref_iterator_abort(), you must make a copy. When the iteration has
> + * ref_iterator_free(), you must make a copy. When the iteration has
>   * been exhausted, ref_iterator_advance() releases any resources
>   * associated with the iteration, frees the ref_iterator object, and
>   * returns ITER_DONE. If you want to abort the iteration early, call
> - * ref_iterator_abort(), which also frees the ref_iterator object and
> + * ref_iterator_free(), which also frees the ref_iterator object and
>   * any associated resources. If there was an internal error advancing
>   * to the next entry, ref_iterator_advance() aborts the iteration,
>   * frees the ref_iterator, and returns ITER_ERROR.
> @@ -292,10 +292,8 @@ enum do_for_each_ref_flags {
>   *     struct ref_iterator *iter = ...;
>   *
>   *     while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
> - *             if (want_to_stop_iteration()) {
> - *                     ok = ref_iterator_abort(iter);
> + *             if (want_to_stop_iteration())
>   *                     break;
> - *             }

I also think here we have problem, at least we should set `ok =
ITER_DONE`.

>   *
>   *             // Access information about the current reference:
>   *             if (!(iter->flags & REF_ISSYMREF))
> @@ -307,6 +305,7 @@ enum do_for_each_ref_flags {
>   *
>   *     if (ok != ITER_DONE)
>   *             handle_error();
> + *     ref_iterator_free(iter);
>   */
>  struct ref_iterator {
>  	struct ref_iterator_vtable *vtable;
> @@ -333,12 +332,8 @@ int ref_iterator_advance(struct ref_iterator *ref_iterator);
>  int ref_iterator_peel(struct ref_iterator *ref_iterator,
>  		      struct object_id *peeled);
>  
> -/*
> - * End the iteration before it has been exhausted, freeing the
> - * reference iterator and any associated resources and returning
> - * ITER_DONE. If the abort itself failed, return ITER_ERROR.
> - */
> -int ref_iterator_abort(struct ref_iterator *ref_iterator);
> +/* Free the reference iterator and any associated resources. */
> +void ref_iterator_free(struct ref_iterator *ref_iterator);
>  
>  /*
>   * An iterator over nothing (its first ref_iterator_advance() call
> @@ -438,13 +433,6 @@ struct ref_iterator *prefix_ref_iterator_begin(struct ref_iterator *iter0,
>  void base_ref_iterator_init(struct ref_iterator *iter,
>  			    struct ref_iterator_vtable *vtable);
>  
> -/*
> - * Base class destructor for ref_iterators. Destroy the ref_iterator
> - * part of iter and shallow-free the object. This is meant to be
> - * called only by the destructors of derived classes.
> - */
> -void base_ref_iterator_free(struct ref_iterator *iter);
> -

OK, here we delete `base_ref_iterator_free`. Because we will use
`ref_iterator_free`.

>  /* Virtual function declarations for ref_iterators: */
>  
>  /*
> @@ -463,15 +451,14 @@ typedef int ref_iterator_peel_fn(struct ref_iterator *ref_iterator,
>  
>  /*
>   * Implementations of this function should free any resources specific
> - * to the derived class, then call base_ref_iterator_free() to clean
> - * up and free the ref_iterator object.
> + * to the derived class.
>   */
> -typedef int ref_iterator_abort_fn(struct ref_iterator *ref_iterator);
> +typedef void ref_iterator_release_fn(struct ref_iterator *ref_iterator);

So here why we name this function to `release` is that we want to
release any resource related to iterator but we don't free iterator
itself.

[snip]


---

Some my thinking after reading this whole patch:

1. We somehow misuses the "ITER_DONE". We either need to adjust its
meaning (we just delete the iterator is freed part) or introduces a new
state to represent that the iteration is done but we don't release its
resource and free the iterator itself.

2. I don't think we need to make things complicated. From my
understanding, the motivation here is that we don't want to `advance`
callback to call `abort` callback. I want to ask an _important_ question
here: what is the motivation we rename `abort` to `release` in the first
place? As far as I know, we only call this callback in the newly created
"ref_iterator_free". Although release may be more accurate, this change
truly causes overhead of this patch.

3. If the motivation is that we don't want to `advance` callback to call
`abort` callback. I think we could just let the user call `abort`
callback for the following two situations:

    1. We have exhausted the iteration. It returns `ITER_OK`.
    2. We encountered the error, it returns `ITER_ERROR`.

And we give the freedom to the caller. It's their duty to call
`ref_iterator_abort` which cleans the resource and free the iterator.

Writing here, I have always thought that there is a situation that we
just want to release the resources but not want to free the iterator
itself. That's why I am wondering why just add a new variable to do.
However, if we just want to make the lifecycle out, we just delete each
"abort" code where it frees the iterator. And we free the iterator in
the "ref_iterator_abort". Should this be enough?

Thanks,
Jialuo
Karthik Nayak Feb. 18, 2025, 5:13 p.m. UTC | #2
Patrick Steinhardt <ps@pks.im> writes:

> The ref and reflog iterators have their lifecycle attached to iteration:
> once the iterator reaches its end, it is automatically released and the
> caller doesn't have to care about that anymore. When the iterator should
> be released before it has been exhausted, callers must explicitly abort
> the iterator via `ref_iterator_abort()`.
>
> This lifecycle is somewhat unusual in the Git codebase and creates two
> problems:
>
>   - Callsites need to be very careful about when exactly they call
>     `ref_iterator_abort()`, as calling the function is only valid when
>     the iterator itself still is. This leads to somewhat awkward calling
>     patterns in some situations.
>
>   - It is impossible to reuse iterators and re-seek them to a different
>     prefix. This feature isn't supported by any iterator implementation
>     except for the reftable iterators anyway, but if it was implemented
>     it would allow us to optimize cases where we need to search for
>     specific references repeatedly by reusing internal state.
>
> Detangle the lifecycle from iteration so that we don't deallocate the
> iterator anymore once it is exhausted. Instead, callers are now expected
> to always call a newly introduce `ref_iterator_free()` function that
> deallocates the iterator and its internal state.
>
> While at it, drop the return value of `ref_iterator_abort()`, which
> wasn't really required by any of the iterator implementations anyway.
> Furthermore, stop calling `base_ref_iterator_free()` in any of the
> backends, but instead call it in `ref_iterator_free()`.
>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  builtin/clone.c              |  2 +
>  dir-iterator.c               | 24 +++++------
>  dir-iterator.h               | 13 ++----
>  refs.c                       |  7 +++-
>  refs/debug.c                 |  9 ++---
>  refs/files-backend.c         | 36 +++++------------
>  refs/iterator.c              | 95 ++++++++++++++------------------------------
>  refs/packed-backend.c        | 27 ++++++-------
>  refs/ref-cache.c             |  9 ++---
>  refs/refs-internal.h         | 31 +++++----------
>  refs/reftable-backend.c      | 34 ++++------------
>  t/helper/test-dir-iterator.c |  1 +
>  12 files changed, 99 insertions(+), 189 deletions(-)
>
> diff --git a/builtin/clone.c b/builtin/clone.c
> index fd001d800c6..ac3e84b2b18 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -426,6 +426,8 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
>  		strbuf_setlen(src, src_len);
>  		die(_("failed to iterate over '%s'"), src->buf);
>  	}
> +
> +	dir_iterator_free(iter);
>  }
>

A bit puzzled to see `dir_iterator_*` change here, I'm assuming it's
linked to the 'files-backend' and perhaps similar to the changes
mentioned about `ref_iterator_*` in the commit message. Would be nice to
call out in the commit message too.

[snip]

> @@ -223,7 +223,7 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
>  		} else {
>  			if (level->entries_idx >= level->entries.nr) {
>  				if (pop_level(iter) == 0)
> -					return dir_iterator_abort(dir_iterator);
> +					return ITER_DONE;
>  				continue;
>  			}
>
> @@ -232,22 +232,21 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
>
>  		if (prepare_next_entry_data(iter, name)) {
>  			if (errno != ENOENT && iter->flags & DIR_ITERATOR_PEDANTIC)
> -				goto error_out;
> +				return ITER_ERROR;
>  			continue;
>  		}
>
>  		return ITER_OK;
>  	}
> -
> -error_out:
> -	dir_iterator_abort(dir_iterator);
> -	return ITER_ERROR;
>  }

Okay yeah, we're getting rid of `dir_iterator_abort` so potentially add
`dir_iterator_free` below

>
> -int dir_iterator_abort(struct dir_iterator *dir_iterator)
> +void dir_iterator_free(struct dir_iterator *dir_iterator)
>  {
>  	struct dir_iterator_int *iter = (struct dir_iterator_int *)dir_iterator;
>
> +	if (!iter)
> +		return;
> +
>  	for (; iter->levels_nr; iter->levels_nr--) {
>  		struct dir_iterator_level *level =
>  			&iter->levels[iter->levels_nr - 1];
> @@ -266,7 +265,6 @@ int dir_iterator_abort(struct dir_iterator *dir_iterator)
>  	free(iter->levels);
>  	strbuf_release(&iter->base.path);
>  	free(iter);
> -	return ITER_DONE;
>  }
>
>  struct dir_iterator *dir_iterator_begin(const char *path, unsigned int flags)

Okay this makes sense!

[snip]

> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index 29f08dced40..9511b6f3448 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -919,10 +919,6 @@ static int files_ref_iterator_advance(struct ref_iterator *ref_iterator)
>  		return ITER_OK;
>  	}
>
> -	iter->iter0 = NULL;
> -	if (ref_iterator_abort(ref_iterator) != ITER_DONE)
> -		ok = ITER_ERROR;
> -
>

Since we're explicitly going to call `ref_iterator_free`, this makes sense.

>  	return ok;
>  }
>
> @@ -935,23 +931,17 @@ static int files_ref_iterator_peel(struct ref_iterator *ref_iterator,
>  	return ref_iterator_peel(iter->iter0, peeled);
>  }
>
> -static int files_ref_iterator_abort(struct ref_iterator *ref_iterator)
> +static void files_ref_iterator_release(struct ref_iterator *ref_iterator)
>  {
>  	struct files_ref_iterator *iter =
>  		(struct files_ref_iterator *)ref_iterator;
> -	int ok = ITER_DONE;
> -
> -	if (iter->iter0)
> -		ok = ref_iterator_abort(iter->iter0);
> -
> -	base_ref_iterator_free(ref_iterator);
> -	return ok;
> +	ref_iterator_free(iter->iter0);
>  }
>

I like how much more cleaner it looks now.

[snip]

> diff --git a/refs/packed-backend.c b/refs/packed-backend.c
> index a7b6f74b6e3..38a1956d1a8 100644
> --- a/refs/packed-backend.c
> +++ b/refs/packed-backend.c
> @@ -954,9 +954,6 @@ static int packed_ref_iterator_advance(struct ref_iterator *ref_iterator)
>  		return ITER_OK;
>  	}
>
> -	if (ref_iterator_abort(ref_iterator) != ITER_DONE)
> -		ok = ITER_ERROR;
> -
>  	return ok;
>  }
>

The merged_iterator is used to combine the files and packed backend
iterators to provide a uniform view over them. Likewise the changes here
seem similar too.

> @@ -976,23 +973,19 @@ static int packed_ref_iterator_peel(struct ref_iterator *ref_iterator,
>  	}
>  }
>
> -static int packed_ref_iterator_abort(struct ref_iterator *ref_iterator)
> +static void packed_ref_iterator_release(struct ref_iterator *ref_iterator)
>  {
>  	struct packed_ref_iterator *iter =
>  		(struct packed_ref_iterator *)ref_iterator;
> -	int ok = ITER_DONE;
> -
>  	strbuf_release(&iter->refname_buf);
>  	free(iter->jump);
>  	release_snapshot(iter->snapshot);
> -	base_ref_iterator_free(ref_iterator);
> -	return ok;
>  }
>
>  static struct ref_iterator_vtable packed_ref_iterator_vtable = {
>  	.advance = packed_ref_iterator_advance,
>  	.peel = packed_ref_iterator_peel,
> -	.abort = packed_ref_iterator_abort
> +	.release = packed_ref_iterator_release,
>  };
>
>  static int jump_list_entry_cmp(const void *va, const void *vb)
> @@ -1362,8 +1355,10 @@ static int write_with_updates(struct packed_ref_store *refs,
>  	 */
>  	iter = packed_ref_iterator_begin(&refs->base, "", NULL,
>  					 DO_FOR_EACH_INCLUDE_BROKEN);
> -	if ((ok = ref_iterator_advance(iter)) != ITER_OK)
> +	if ((ok = ref_iterator_advance(iter)) != ITER_OK) {
> +		ref_iterator_free(iter);

Nit: Since we don't return early here, wouldn't the `ref_iterator_free`
at the end of the function be sufficient? I think the only early return
when `iter == NULL` is towards the end of the function, where it might
be better to add a `goto error`.

[snip]
Patrick Steinhardt Feb. 19, 2025, 11:52 a.m. UTC | #3
On Wed, Feb 19, 2025 at 12:52:23AM +0800, shejialuo wrote:
> On Mon, Feb 17, 2025 at 04:50:21PM +0100, Patrick Steinhardt wrote:
> > The ref and reflog iterators have their lifecycle attached to iteration:
> > once the iterator reaches its end, it is automatically released and the
> > caller doesn't have to care about that anymore. When the iterator should
> > be released before it has been exhausted, callers must explicitly abort
> > the iterator via `ref_iterator_abort()`.
> > 
> > This lifecycle is somewhat unusual in the Git codebase and creates two
> > problems:
> > 
> >   - Callsites need to be very careful about when exactly they call
> >     `ref_iterator_abort()`, as calling the function is only valid when
> >     the iterator itself still is. This leads to somewhat awkward calling
> >     patterns in some situations.
> > 
> 
> In what situations, the iterator has been disappeared when we call
> `ref_iterator_abort`? Why is this awkward? When reading, I am really
> curious about this.

It leads to patterns where you have to call `ref_iterator_abort()`, but
only in a subset of code paths. You need to be really careful about the
lifetime of the iterator, and I had several occasions where I was doing
the wrong thing because I missed this subtlety. Compared to that, it is
trivial (and less code as demonstrated by the patch) to unconditionally
call `ref_iterator_free()`.

> >   - It is impossible to reuse iterators and re-seek them to a different
> >     prefix. This feature isn't supported by any iterator implementation
> >     except for the reftable iterators anyway, but if it was implemented
> >     it would allow us to optimize cases where we need to search for
> >     specific references repeatedly by reusing internal state.
> > 
> 
> So, the reason why we cannot reuse the iterator is that we will
> deallocate the iterator? So, below we want to detangle the lifecycle. I
> don't know whether my understanding is correct.

Yup, your understanding is correct. A deallocated iterator cannot be
reused.

> > Detangle the lifecycle from iteration so that we don't deallocate the
> > iterator anymore once it is exhausted. Instead, callers are now expected
> > to always call a newly introduce `ref_iterator_free()` function that
> > deallocates the iterator and its internal state.
> > 
> 
> A design question: why do not just introduce a variable, for example,
> `unsigned int free` to indicate whether we need to free the iterator?

Because that would make the code even more subtle than it already is.
Doing things unconditionally is always easier to reason about than doing
them conditionally.

> > diff --git a/dir-iterator.c b/dir-iterator.c
> > index de619846f29..857e1d9bdaf 100644
> > --- a/dir-iterator.c
> > +++ b/dir-iterator.c
> > @@ -211,11 +211,11 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
> >  			int ret = next_directory_entry(level->dir, iter->base.path.buf, &de);
> >  			if (ret < 0) {
> >  				if (iter->flags & DIR_ITERATOR_PEDANTIC)
> > -					goto error_out;
> > +					return ITER_ERROR;
> >  				continue;
> >  			} else if (ret > 0) {
> >  				if (pop_level(iter) == 0)
> > -					return dir_iterator_abort(dir_iterator);
> > +					return ITER_DONE;
> 
> Instead of calling `dir_iterator_abort`, we just return `ITER_DONE`.
> However, this does not make sense. We break the semantics of
> `ITER_DONE`. Let me cite the comment from "iterator.h":
> 
>     /*
>      * The iterator is exhausted and has been freed.
>      */
>     #define ITER_DONE -1
> 
> However, we don't free the iterator here.

Oh, yeah, I'll have to adapt that comment, thanks for pointing it out.

> > @@ -266,7 +265,6 @@ int dir_iterator_abort(struct dir_iterator *dir_iterator)
> >  	free(iter->levels);
> >  	strbuf_release(&iter->base.path);
> >  	free(iter);
> > -	return ITER_DONE;
> 
> I am confused that why we don't provide `dir_iterator_release` for dir
> iterator? For other ref related iterators, we will rename "*abort" to
> "*release" and drop the operation to free the iterator. However, for dir
> iterator, we rename its "*abort" to "*free".
> 
> I think this does not make sense. It causes inconsistency. Although dir
> iterator does _not_ have "peel" method. But it does have "advance" and
> "abort" methods just like ref iterators.
> 
> I think you have already considered this problem. I guess that's the
> reason why in the below comment you typed `dir_iterator_release` instead
> of `dir_iterator_free`.

The `dir_iterator` itself is already being inconsistent with every other
iterator that we have because it is not a `ref_iterator`. It is _used_
to implement ref iterators, but doesn't provide the same interface. And
because of that we cannot rely on `ref_iterator_free()` to free it, but
must instead provide `dir_iterator_free()` to do so.

I have amended the commit message to explain why this one is special.

> > diff --git a/dir-iterator.h b/dir-iterator.h
> > index 6d438809b6e..01f51f6bac1 100644
> > --- a/dir-iterator.h
> > +++ b/dir-iterator.h
> > @@ -27,10 +27,8 @@
> >   *             goto error_handler;
> >   *
> >   *     while ((ok = dir_iterator_advance(iter)) == ITER_OK) {
> > - *             if (want_to_stop_iteration()) {
> > - *                     ok = dir_iterator_abort(iter);
> > + *             if (want_to_stop_iteration())
> >   *                     break;
> > - *             }
> 
> Is this correct? If `want_to_stop_iteration()` is true, the `ok` will be
> `ITER_OK` and then it breaks the loop to jump to check whether the `ok`
> is `ITER_DONE`. Of course it is not, it will call `handle_error`. At
> least, we should assign `ok = ITER_DONE`.

True, fixed.

> >   *
> >   *             // Access information about the current path:
> >   *             if (S_ISDIR(iter->st.st_mode))
> > @@ -39,6 +37,7 @@
> >   *
> >   *     if (ok != ITER_DONE)
> >   *             handle_error();
> > + *     dir_iterator_release(iter);
> 
> I think this is a typo. Should `dir_iterator_release` be
> `dir_iterator_free`?

Yup, fixed.

> > diff --git a/refs/files-backend.c b/refs/files-backend.c
> > index 29f08dced40..9511b6f3448 100644
> > --- a/refs/files-backend.c
> > +++ b/refs/files-backend.c
> > @@ -919,10 +919,6 @@ static int files_ref_iterator_advance(struct ref_iterator *ref_iterator)
> >  		return ITER_OK;
> >  	}
> >  
> > -	iter->iter0 = NULL;
> > -	if (ref_iterator_abort(ref_iterator) != ITER_DONE)
> > -		ok = ITER_ERROR;
> > -
> 
> We don't abort the iterator in `advance`. Is this because we want to
> reuse this iterator?

Exactly, this is basically the gist of this whole patch.

> > @@ -935,23 +931,17 @@ static int files_ref_iterator_peel(struct ref_iterator *ref_iterator,
> >  	return ref_iterator_peel(iter->iter0, peeled);
> >  }
> >  
> > -int ref_iterator_abort(struct ref_iterator *ref_iterator)
> > +void ref_iterator_free(struct ref_iterator *ref_iterator)
> >  {
> > -	return ref_iterator->vtable->abort(ref_iterator);
> > +	if (ref_iterator) {
> > +		ref_iterator->vtable->release(ref_iterator);
> > +		/* Help make use-after-free bugs fail quickly: */
> > +		ref_iterator->vtable = NULL;
> > +		free(ref_iterator);
> > +	}
> >  }
> >  
> 
> So, when calling `ref_iterator_free`, we will call corresponding
> "release" method to release the resources associated and then we free
> the iterator. Would this be too complicated? From my view, we could just
> make the `ref_iterator_abort` name unchanged but add a new variable such
> as "unsigned int free_iterator". And we change each "abort" callback to
> avoid free the iterator. This would be much simpler.

I think adding a separate variable to track whether or not things should
be freed would make things way more complicated. I would claim the
opposite: the fact that the patch removes 100 lines of code demonstrates
quite neatly that the new design is way simpler and needs less logic.

It's also simpler to reason about from my perspective: you allocate an
iterator, you free it. No conditionals, no nothing.

> [snip]
> 
> > diff --git a/refs/ref-cache.c b/refs/ref-cache.c
> > index 02f09e4df88..6457e02c1ea 100644
> > --- a/refs/ref-cache.c
> > +++ b/refs/ref-cache.c
> > @@ -409,7 +409,7 @@ static int cache_ref_iterator_advance(struct ref_iterator *ref_iterator)
> >  		if (++level->index == level->dir->nr) {
> >  			/* This level is exhausted; pop up a level */
> >  			if (--iter->levels_nr == 0)
> > -				return ref_iterator_abort(ref_iterator);
> > +				return ITER_DONE;
> 
> As I have said, simply return `ITER_DONE` breaks the semantics of the
> `ITER_DONE`.

It doesn't: `ITER_DONE` indicates that the iterator has been exhausted.
The current comment is stale though, as it claims that the iterator will
also have been free'd.

> > diff --git a/refs/refs-internal.h b/refs/refs-internal.h
> > index aaab711bb96..27ff822cf43 100644
> > --- a/refs/refs-internal.h
> > +++ b/refs/refs-internal.h
> > @@ -292,10 +292,8 @@ enum do_for_each_ref_flags {
> >   *     struct ref_iterator *iter = ...;
> >   *
> >   *     while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
> > - *             if (want_to_stop_iteration()) {
> > - *                     ok = ref_iterator_abort(iter);
> > + *             if (want_to_stop_iteration())
> >   *                     break;
> > - *             }
> 
> I also think here we have problem, at least we should set `ok =
> ITER_DONE`.

Yup, same as in the other comment indeed.

> Some my thinking after reading this whole patch:
> 
> 1. We somehow misuses the "ITER_DONE". We either need to adjust its
> meaning (we just delete the iterator is freed part) or introduces a new
> state to represent that the iteration is done but we don't release its
> resource and free the iterator itself.

Yup, addressed.

> 2. I don't think we need to make things complicated. From my
> understanding, the motivation here is that we don't want to `advance`
> callback to call `abort` callback. I want to ask an _important_ question
> here: what is the motivation we rename `abort` to `release` in the first
> place? As far as I know, we only call this callback in the newly created
> "ref_iterator_free". Although release may be more accurate, this change
> truly causes overhead of this patch.

We have to touch up all of these functions anyway, so renaming them at
the same point doesn't feel like it adds any more complexity.

> 3. If the motivation is that we don't want to `advance` callback to call
> `abort` callback. I think we could just let the user call `abort`
> callback for the following two situations:
> 
>     1. We have exhausted the iteration. It returns `ITER_OK`.

This is impossible because the caller wouldn't be able to discern an
exhausted iterator from an iterator that still has entries. We have to
discern `ITER_OK` and `ITER_DONE`.

>     2. We encountered the error, it returns `ITER_ERROR`.

Yup.

> And we give the freedom to the caller. It's their duty to call
> `ref_iterator_abort` which cleans the resource and free the iterator.

Yup.

> Writing here, I have always thought that there is a situation that we
> just want to release the resources but not want to free the iterator
> itself. That's why I am wondering why just add a new variable to do.
> However, if we just want to make the lifecycle out, we just delete each
> "abort" code where it frees the iterator. And we free the iterator in
> the "ref_iterator_abort". Should this be enough?

As mentioned, I don't think a new variable would lead to a simplified
architecture. With re-seekable iterators the caller is the one who needs
to control whether the iterator should or should not be free'd, as they
are the only one who knows whether they want to reuse it. So making it
the responsibility of the callers to release the iterator is the proper
way to do it.

I also don't quite see the complexity argument. The patch rather clearly
shows that we're _reducing_ complexity as it allows us to drop around a
100 lines of code. We can stop worrying about whether or not we have to
call `ref_iterator_abort()` and don't have to worry about any conditions
that leak from the iterator subsystem to the callers. We just free the
iterator once we're done with it and call it a day.

Thanks for your input!

Patrick
Patrick Steinhardt Feb. 19, 2025, 11:52 a.m. UTC | #4
On Tue, Feb 18, 2025 at 09:13:55AM -0800, Karthik Nayak wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> > diff --git a/builtin/clone.c b/builtin/clone.c
> > index fd001d800c6..ac3e84b2b18 100644
> > --- a/builtin/clone.c
> > +++ b/builtin/clone.c
> > @@ -426,6 +426,8 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
> >  		strbuf_setlen(src, src_len);
> >  		die(_("failed to iterate over '%s'"), src->buf);
> >  	}
> > +
> > +	dir_iterator_free(iter);
> >  }
> >
> 
> A bit puzzled to see `dir_iterator_*` change here, I'm assuming it's
> linked to the 'files-backend' and perhaps similar to the changes
> mentioned about `ref_iterator_*` in the commit message. Would be nice to
> call out in the commit message too.

Yeah, that's the reason. I've added a note to the commit message.

> > diff --git a/refs/packed-backend.c b/refs/packed-backend.c
> > index a7b6f74b6e3..38a1956d1a8 100644
> > --- a/refs/packed-backend.c
> > +++ b/refs/packed-backend.c
> > @@ -1362,8 +1355,10 @@ static int write_with_updates(struct packed_ref_store *refs,
> >  	 */
> >  	iter = packed_ref_iterator_begin(&refs->base, "", NULL,
> >  					 DO_FOR_EACH_INCLUDE_BROKEN);
> > -	if ((ok = ref_iterator_advance(iter)) != ITER_OK)
> > +	if ((ok = ref_iterator_advance(iter)) != ITER_OK) {
> > +		ref_iterator_free(iter);
> 
> Nit: Since we don't return early here, wouldn't the `ref_iterator_free`
> at the end of the function be sufficient? I think the only early return
> when `iter == NULL` is towards the end of the function, where it might
> be better to add a `goto error`.

Yeah, the code here could definitely be improved with the new semantics.
But I was aiming to keep the required refactoring work at callsites to
the bare minimum, so I'd rather prefer to keep this unchanged.

Patrick
shejialuo Feb. 19, 2025, 12:41 p.m. UTC | #5
On Wed, Feb 19, 2025 at 12:52:06PM +0100, Patrick Steinhardt wrote:
> > > @@ -935,23 +931,17 @@ static int files_ref_iterator_peel(struct ref_iterator *ref_iterator,
> > >  	return ref_iterator_peel(iter->iter0, peeled);
> > >  }
> > >  
> > > -int ref_iterator_abort(struct ref_iterator *ref_iterator)
> > > +void ref_iterator_free(struct ref_iterator *ref_iterator)
> > >  {
> > > -	return ref_iterator->vtable->abort(ref_iterator);
> > > +	if (ref_iterator) {
> > > +		ref_iterator->vtable->release(ref_iterator);
> > > +		/* Help make use-after-free bugs fail quickly: */
> > > +		ref_iterator->vtable = NULL;
> > > +		free(ref_iterator);
> > > +	}
> > >  }
> > >  
> > 
> > So, when calling `ref_iterator_free`, we will call corresponding
> > "release" method to release the resources associated and then we free
> > the iterator. Would this be too complicated? From my view, we could just
> > make the `ref_iterator_abort` name unchanged but add a new variable such
> > as "unsigned int free_iterator". And we change each "abort" callback to
> > avoid free the iterator. This would be much simpler.
> 
> I think adding a separate variable to track whether or not things should
> be freed would make things way more complicated. I would claim the
> opposite: the fact that the patch removes 100 lines of code demonstrates
> quite neatly that the new design is way simpler and needs less logic.
> 

I agree with you with the current implementation, we don't need to worry
about calling "abort" callback in "advance". And the logic is simpler.
When writing this comment, I have thought that there is a situation that
we just want to call "release" callback. So, that's the reason why I
have asked you why not just add a new variable.

However, we don't call "release" callback expect in `ref_iterator_free`.

[snip]

> > 2. I don't think we need to make things complicated. From my
> > understanding, the motivation here is that we don't want to `advance`
> > callback to call `abort` callback. I want to ask an _important_ question
> > here: what is the motivation we rename `abort` to `release` in the first
> > place? As far as I know, we only call this callback in the newly created
> > "ref_iterator_free". Although release may be more accurate, this change
> > truly causes overhead of this patch.
> 
> We have to touch up all of these functions anyway, so renaming them at
> the same point doesn't feel like it adds any more complexity.
> 

I will explain this in the later.

[snip]

> > Writing here, I have always thought that there is a situation that we
> > just want to release the resources but not want to free the iterator
> > itself. That's why I am wondering why just add a new variable to do.
> > However, if we just want to make the lifecycle out, we just delete each
> > "abort" code where it frees the iterator. And we free the iterator in
> > the "ref_iterator_abort". Should this be enough?
> 
> As mentioned, I don't think a new variable would lead to a simplified
> architecture. With re-seekable iterators the caller is the one who needs
> to control whether the iterator should or should not be free'd, as they
> are the only one who knows whether they want to reuse it. So making it
> the responsibility of the callers to release the iterator is the proper
> way to do it.
> 

Yes, actually I think you have misunderstood my meaning here. I have
abandoned the idea to add a new variable when writing here. After
reading through the patch, I know that your motivation. My point is
"However, ..." part.

> I also don't quite see the complexity argument. The patch rather clearly
> shows that we're _reducing_ complexity as it allows us to drop around a
> 100 lines of code. We can stop worrying about whether or not we have to
> call `ref_iterator_abort()` and don't have to worry about any conditions
> that leak from the iterator subsystem to the callers. We just free the
> iterator once we're done with it and call it a day.
> 

Yes, I agree with you that we truly reduce complexity here. And as you
have said, the user allocate the iterator and free the iterator. With
this, we make call sequence clearer.

But there is one thing I want to argue with. I don't think we need to
rename "abort" callback to "release" and also "ref_iterator_abort" to
"ref_iterator_free" for the following reasons:

1. We never call "release" expect in the "ref_iterator_free" function.
For other exposed functions "ref_iterator_advance", "ref_iterator_peel"
and the original "ref_iterator_abort". We will just call the registered
callback "advance", "peel" or "abort" via virtual table. I somehow think
we should follow this pattern. But I don't know actually.
2. When I read the patch yesterday, I really wonder what is the
difference between "release" and "free". Why do we only change the
"ref_iterator_abort" to "ref_iterator_free" but for the callback, we
rename "abort" to "release". I know that you want to distinguish to
emphasis that we won't free the iterator but only release its resource
for ref iterator. But could abort also mean this?

Thanks,
Jialuo
Patrick Steinhardt Feb. 19, 2025, 12:59 p.m. UTC | #6
On Wed, Feb 19, 2025 at 08:41:03PM +0800, shejialuo wrote:
> But there is one thing I want to argue with. I don't think we need to
> rename "abort" callback to "release" and also "ref_iterator_abort" to
> "ref_iterator_free" for the following reasons:
> 
> 1. We never call "release" expect in the "ref_iterator_free" function.
> For other exposed functions "ref_iterator_advance", "ref_iterator_peel"
> and the original "ref_iterator_abort". We will just call the registered
> callback "advance", "peel" or "abort" via virtual table. I somehow think
> we should follow this pattern. But I don't know actually.
> 2. When I read the patch yesterday, I really wonder what is the
> difference between "release" and "free". Why do we only change the
> "ref_iterator_abort" to "ref_iterator_free" but for the callback, we
> rename "abort" to "release". I know that you want to distinguish to
> emphasis that we won't free the iterator but only release its resource
> for ref iterator. But could abort also mean this?

The difference between "release" and "free" is explicitly documented in
our CodingGuidelines. Quoting the relevant parts:

    - `S_release()` releases a structure's contents without freeing the
      structure.

    - `S_free()` releases a structure's contents and frees the
      structure.

So following these coding guidelines, we have to call the underlying
implementations that are specific to the iterators `release()` because
they don't free the iterator itself. And because the generic part _does_
free the iterator itself in addition to releasing its state, it has to
be called `free()`.

Regarding the question why to even rename `ref_iterator_abort()` itself:
this is done to avoid confusion going forward. Previously it really only
had to be called when you actually wanted to abort an ongoing iteration
over its yielded references. This is not the case anymore, and now you
have to call it unconditionally after you're done with the iterator. So
while the naming previously made sense, now it doesn't anymore.

Patrick
shejialuo Feb. 19, 2025, 1:06 p.m. UTC | #7
On Wed, Feb 19, 2025 at 01:59:13PM +0100, Patrick Steinhardt wrote:
> On Wed, Feb 19, 2025 at 08:41:03PM +0800, shejialuo wrote:
> > But there is one thing I want to argue with. I don't think we need to
> > rename "abort" callback to "release" and also "ref_iterator_abort" to
> > "ref_iterator_free" for the following reasons:
> > 
> > 1. We never call "release" expect in the "ref_iterator_free" function.
> > For other exposed functions "ref_iterator_advance", "ref_iterator_peel"
> > and the original "ref_iterator_abort". We will just call the registered
> > callback "advance", "peel" or "abort" via virtual table. I somehow think
> > we should follow this pattern. But I don't know actually.
> > 2. When I read the patch yesterday, I really wonder what is the
> > difference between "release" and "free". Why do we only change the
> > "ref_iterator_abort" to "ref_iterator_free" but for the callback, we
> > rename "abort" to "release". I know that you want to distinguish to
> > emphasis that we won't free the iterator but only release its resource
> > for ref iterator. But could abort also mean this?
> 
> The difference between "release" and "free" is explicitly documented in
> our CodingGuidelines. Quoting the relevant parts:
> 
>     - `S_release()` releases a structure's contents without freeing the
>       structure.
> 
>     - `S_free()` releases a structure's contents and frees the
>       structure.
> 
> So following these coding guidelines, we have to call the underlying
> implementations that are specific to the iterators `release()` because
> they don't free the iterator itself. And because the generic part _does_
> free the iterator itself in addition to releasing its state, it has to
> be called `free()`.
> 

Make sense.

> Regarding the question why to even rename `ref_iterator_abort()` itself:
> this is done to avoid confusion going forward. Previously it really only
> had to be called when you actually wanted to abort an ongoing iteration
> over its yielded references. This is not the case anymore, and now you
> have to call it unconditionally after you're done with the iterator. So
> while the naming previously made sense, now it doesn't anymore.
> 

Good point, I didn't realise this part. Thanks for the detailed
explanation. I will continue to review the later patches. However, I
won't touch the oid part, because I am not familiar with this. By the
way, I think we miss out one thing in this patch:

We forget to free the dir iterator defined in the
"files-backend.c::files_fsck_refs_dir". I have just remembered that I
use dir iterator when checking the ref consistency.

Thanks,
Jialuo

> Patrick
Patrick Steinhardt Feb. 19, 2025, 1:17 p.m. UTC | #8
On Wed, Feb 19, 2025 at 09:06:58PM +0800, shejialuo wrote:
> On Wed, Feb 19, 2025 at 01:59:13PM +0100, Patrick Steinhardt wrote:
> > Regarding the question why to even rename `ref_iterator_abort()` itself:
> > this is done to avoid confusion going forward. Previously it really only
> > had to be called when you actually wanted to abort an ongoing iteration
> > over its yielded references. This is not the case anymore, and now you
> > have to call it unconditionally after you're done with the iterator. So
> > while the naming previously made sense, now it doesn't anymore.
> > 
> 
> Good point, I didn't realise this part. Thanks for the detailed
> explanation. I will continue to review the later patches. However, I
> won't touch the oid part, because I am not familiar with this. By the
> way, I think we miss out one thing in this patch:
> 
> We forget to free the dir iterator defined in the
> "files-backend.c::files_fsck_refs_dir". I have just remembered that I
> use dir iterator when checking the ref consistency.

Hm, good point. Why doesn't CI complain about this leak...? I'll
investigate, thanks for the hint!

Patrick
Patrick Steinhardt Feb. 19, 2025, 1:20 p.m. UTC | #9
On Wed, Feb 19, 2025 at 02:17:07PM +0100, Patrick Steinhardt wrote:
> On Wed, Feb 19, 2025 at 09:06:58PM +0800, shejialuo wrote:
> > On Wed, Feb 19, 2025 at 01:59:13PM +0100, Patrick Steinhardt wrote:
> > > Regarding the question why to even rename `ref_iterator_abort()` itself:
> > > this is done to avoid confusion going forward. Previously it really only
> > > had to be called when you actually wanted to abort an ongoing iteration
> > > over its yielded references. This is not the case anymore, and now you
> > > have to call it unconditionally after you're done with the iterator. So
> > > while the naming previously made sense, now it doesn't anymore.
> > > 
> > 
> > Good point, I didn't realise this part. Thanks for the detailed
> > explanation. I will continue to review the later patches. However, I
> > won't touch the oid part, because I am not familiar with this. By the
> > way, I think we miss out one thing in this patch:
> > 
> > We forget to free the dir iterator defined in the
> > "files-backend.c::files_fsck_refs_dir". I have just remembered that I
> > use dir iterator when checking the ref consistency.
> 
> Hm, good point. Why doesn't CI complain about this leak...? I'll
> investigate, thanks for the hint!

Wait, no, I had been looking at the wrong branch. We do free the
iterator:

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 11a620ea11a..859f1c11941 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3837,6 +3820,7 @@ static int files_fsck_refs_dir(struct ref_store *ref_store,
                ret = error(_("failed to iterate over '%s'"), sb.buf);

 out:
+       dir_iterator_free(iter);
        strbuf_release(&sb);
        strbuf_release(&refname);
        return ret;

Patrick
shejialuo Feb. 19, 2025, 1:23 p.m. UTC | #10
On Wed, Feb 19, 2025 at 02:20:15PM +0100, Patrick Steinhardt wrote:
> On Wed, Feb 19, 2025 at 02:17:07PM +0100, Patrick Steinhardt wrote:
> > On Wed, Feb 19, 2025 at 09:06:58PM +0800, shejialuo wrote:
> > > On Wed, Feb 19, 2025 at 01:59:13PM +0100, Patrick Steinhardt wrote:
> > > > Regarding the question why to even rename `ref_iterator_abort()` itself:
> > > > this is done to avoid confusion going forward. Previously it really only
> > > > had to be called when you actually wanted to abort an ongoing iteration
> > > > over its yielded references. This is not the case anymore, and now you
> > > > have to call it unconditionally after you're done with the iterator. So
> > > > while the naming previously made sense, now it doesn't anymore.
> > > > 
> > > 
> > > Good point, I didn't realise this part. Thanks for the detailed
> > > explanation. I will continue to review the later patches. However, I
> > > won't touch the oid part, because I am not familiar with this. By the
> > > way, I think we miss out one thing in this patch:
> > > 
> > > We forget to free the dir iterator defined in the
> > > "files-backend.c::files_fsck_refs_dir". I have just remembered that I
> > > use dir iterator when checking the ref consistency.
> > 
> > Hm, good point. Why doesn't CI complain about this leak...? I'll
> > investigate, thanks for the hint!
> 
> Wait, no, I had been looking at the wrong branch. We do free the
> iterator:
> 
> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index 11a620ea11a..859f1c11941 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -3837,6 +3820,7 @@ static int files_fsck_refs_dir(struct ref_store *ref_store,
>                 ret = error(_("failed to iterate over '%s'"), sb.buf);
> 
>  out:
> +       dir_iterator_free(iter);
>         strbuf_release(&sb);
>         strbuf_release(&refname);
>         return ret;
> 
> Patrick

Oh, my mistake. I omit that part during review... Sorry here.
diff mbox series

Patch

diff --git a/builtin/clone.c b/builtin/clone.c
index fd001d800c6..ac3e84b2b18 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -426,6 +426,8 @@  static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
 		strbuf_setlen(src, src_len);
 		die(_("failed to iterate over '%s'"), src->buf);
 	}
+
+	dir_iterator_free(iter);
 }
 
 static void clone_local(const char *src_repo, const char *dest_repo)
diff --git a/dir-iterator.c b/dir-iterator.c
index de619846f29..857e1d9bdaf 100644
--- a/dir-iterator.c
+++ b/dir-iterator.c
@@ -193,9 +193,9 @@  int dir_iterator_advance(struct dir_iterator *dir_iterator)
 
 	if (S_ISDIR(iter->base.st.st_mode) && push_level(iter)) {
 		if (errno != ENOENT && iter->flags & DIR_ITERATOR_PEDANTIC)
-			goto error_out;
+			return ITER_ERROR;
 		if (iter->levels_nr == 0)
-			goto error_out;
+			return ITER_ERROR;
 	}
 
 	/* Loop until we find an entry that we can give back to the caller. */
@@ -211,11 +211,11 @@  int dir_iterator_advance(struct dir_iterator *dir_iterator)
 			int ret = next_directory_entry(level->dir, iter->base.path.buf, &de);
 			if (ret < 0) {
 				if (iter->flags & DIR_ITERATOR_PEDANTIC)
-					goto error_out;
+					return ITER_ERROR;
 				continue;
 			} else if (ret > 0) {
 				if (pop_level(iter) == 0)
-					return dir_iterator_abort(dir_iterator);
+					return ITER_DONE;
 				continue;
 			}
 
@@ -223,7 +223,7 @@  int dir_iterator_advance(struct dir_iterator *dir_iterator)
 		} else {
 			if (level->entries_idx >= level->entries.nr) {
 				if (pop_level(iter) == 0)
-					return dir_iterator_abort(dir_iterator);
+					return ITER_DONE;
 				continue;
 			}
 
@@ -232,22 +232,21 @@  int dir_iterator_advance(struct dir_iterator *dir_iterator)
 
 		if (prepare_next_entry_data(iter, name)) {
 			if (errno != ENOENT && iter->flags & DIR_ITERATOR_PEDANTIC)
-				goto error_out;
+				return ITER_ERROR;
 			continue;
 		}
 
 		return ITER_OK;
 	}
-
-error_out:
-	dir_iterator_abort(dir_iterator);
-	return ITER_ERROR;
 }
 
-int dir_iterator_abort(struct dir_iterator *dir_iterator)
+void dir_iterator_free(struct dir_iterator *dir_iterator)
 {
 	struct dir_iterator_int *iter = (struct dir_iterator_int *)dir_iterator;
 
+	if (!iter)
+		return;
+
 	for (; iter->levels_nr; iter->levels_nr--) {
 		struct dir_iterator_level *level =
 			&iter->levels[iter->levels_nr - 1];
@@ -266,7 +265,6 @@  int dir_iterator_abort(struct dir_iterator *dir_iterator)
 	free(iter->levels);
 	strbuf_release(&iter->base.path);
 	free(iter);
-	return ITER_DONE;
 }
 
 struct dir_iterator *dir_iterator_begin(const char *path, unsigned int flags)
@@ -301,7 +299,7 @@  struct dir_iterator *dir_iterator_begin(const char *path, unsigned int flags)
 	return dir_iterator;
 
 error_out:
-	dir_iterator_abort(dir_iterator);
+	dir_iterator_free(dir_iterator);
 	errno = saved_errno;
 	return NULL;
 }
diff --git a/dir-iterator.h b/dir-iterator.h
index 6d438809b6e..01f51f6bac1 100644
--- a/dir-iterator.h
+++ b/dir-iterator.h
@@ -27,10 +27,8 @@ 
  *             goto error_handler;
  *
  *     while ((ok = dir_iterator_advance(iter)) == ITER_OK) {
- *             if (want_to_stop_iteration()) {
- *                     ok = dir_iterator_abort(iter);
+ *             if (want_to_stop_iteration())
  *                     break;
- *             }
  *
  *             // Access information about the current path:
  *             if (S_ISDIR(iter->st.st_mode))
@@ -39,6 +37,7 @@ 
  *
  *     if (ok != ITER_DONE)
  *             handle_error();
+ *     dir_iterator_release(iter);
  *
  * Callers are allowed to modify iter->path while they are working,
  * but they must restore it to its original contents before calling
@@ -107,11 +106,7 @@  struct dir_iterator *dir_iterator_begin(const char *path, unsigned int flags);
  */
 int dir_iterator_advance(struct dir_iterator *iterator);
 
-/*
- * End the iteration before it has been exhausted. Free the
- * dir_iterator and any associated resources and return ITER_DONE. On
- * error, free the dir_iterator and return ITER_ERROR.
- */
-int dir_iterator_abort(struct dir_iterator *iterator);
+/* Free the dir_iterator and any associated resources. */
+void dir_iterator_free(struct dir_iterator *iterator);
 
 #endif
diff --git a/refs.c b/refs.c
index eaf41421f50..8eff60a2186 100644
--- a/refs.c
+++ b/refs.c
@@ -2476,6 +2476,7 @@  int refs_verify_refnames_available(struct ref_store *refs,
 {
 	struct strbuf dirname = STRBUF_INIT;
 	struct strbuf referent = STRBUF_INIT;
+	struct ref_iterator *iter = NULL;
 	struct strset dirnames;
 	int ret = -1;
 
@@ -2552,7 +2553,6 @@  int refs_verify_refnames_available(struct ref_store *refs,
 		strbuf_addch(&dirname, '/');
 
 		if (!initial_transaction) {
-			struct ref_iterator *iter;
 			int ok;
 
 			iter = refs_ref_iterator_begin(refs, dirname.buf, NULL, 0,
@@ -2564,12 +2564,14 @@  int refs_verify_refnames_available(struct ref_store *refs,
 
 				strbuf_addf(err, _("'%s' exists; cannot create '%s'"),
 					    iter->refname, refname);
-				ref_iterator_abort(iter);
 				goto cleanup;
 			}
 
 			if (ok != ITER_DONE)
 				BUG("error while iterating over references");
+
+			ref_iterator_free(iter);
+			iter = NULL;
 		}
 
 		extra_refname = find_descendant_ref(dirname.buf, extras, skip);
@@ -2586,6 +2588,7 @@  int refs_verify_refnames_available(struct ref_store *refs,
 	strbuf_release(&referent);
 	strbuf_release(&dirname);
 	strset_clear(&dirnames);
+	ref_iterator_free(iter);
 	return ret;
 }
 
diff --git a/refs/debug.c b/refs/debug.c
index fbc4df08b43..a9786da4ba1 100644
--- a/refs/debug.c
+++ b/refs/debug.c
@@ -179,19 +179,18 @@  static int debug_ref_iterator_peel(struct ref_iterator *ref_iterator,
 	return res;
 }
 
-static int debug_ref_iterator_abort(struct ref_iterator *ref_iterator)
+static void debug_ref_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct debug_ref_iterator *diter =
 		(struct debug_ref_iterator *)ref_iterator;
-	int res = diter->iter->vtable->abort(diter->iter);
-	trace_printf_key(&trace_refs, "iterator_abort: %d\n", res);
-	return res;
+	diter->iter->vtable->release(diter->iter);
+	trace_printf_key(&trace_refs, "iterator_abort\n");
 }
 
 static struct ref_iterator_vtable debug_ref_iterator_vtable = {
 	.advance = debug_ref_iterator_advance,
 	.peel = debug_ref_iterator_peel,
-	.abort = debug_ref_iterator_abort,
+	.release = debug_ref_iterator_release,
 };
 
 static struct ref_iterator *
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 29f08dced40..9511b6f3448 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -919,10 +919,6 @@  static int files_ref_iterator_advance(struct ref_iterator *ref_iterator)
 		return ITER_OK;
 	}
 
-	iter->iter0 = NULL;
-	if (ref_iterator_abort(ref_iterator) != ITER_DONE)
-		ok = ITER_ERROR;
-
 	return ok;
 }
 
@@ -935,23 +931,17 @@  static int files_ref_iterator_peel(struct ref_iterator *ref_iterator,
 	return ref_iterator_peel(iter->iter0, peeled);
 }
 
-static int files_ref_iterator_abort(struct ref_iterator *ref_iterator)
+static void files_ref_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct files_ref_iterator *iter =
 		(struct files_ref_iterator *)ref_iterator;
-	int ok = ITER_DONE;
-
-	if (iter->iter0)
-		ok = ref_iterator_abort(iter->iter0);
-
-	base_ref_iterator_free(ref_iterator);
-	return ok;
+	ref_iterator_free(iter->iter0);
 }
 
 static struct ref_iterator_vtable files_ref_iterator_vtable = {
 	.advance = files_ref_iterator_advance,
 	.peel = files_ref_iterator_peel,
-	.abort = files_ref_iterator_abort,
+	.release = files_ref_iterator_release,
 };
 
 static struct ref_iterator *files_ref_iterator_begin(
@@ -1382,7 +1372,7 @@  static int should_pack_refs(struct files_ref_store *refs,
 				    iter->flags, opts))
 			refcount++;
 		if (refcount >= limit) {
-			ref_iterator_abort(iter);
+			ref_iterator_free(iter);
 			return 1;
 		}
 	}
@@ -1390,6 +1380,7 @@  static int should_pack_refs(struct files_ref_store *refs,
 	if (ret != ITER_DONE)
 		die("error while iterating over references");
 
+	ref_iterator_free(iter);
 	return 0;
 }
 
@@ -1456,6 +1447,7 @@  static int files_pack_refs(struct ref_store *ref_store,
 	packed_refs_unlock(refs->packed_ref_store);
 
 	prune_refs(refs, &refs_to_prune);
+	ref_iterator_free(iter);
 	strbuf_release(&err);
 	return 0;
 }
@@ -2303,9 +2295,6 @@  static int files_reflog_iterator_advance(struct ref_iterator *ref_iterator)
 		return ITER_OK;
 	}
 
-	iter->dir_iterator = NULL;
-	if (ref_iterator_abort(ref_iterator) == ITER_ERROR)
-		ok = ITER_ERROR;
 	return ok;
 }
 
@@ -2315,23 +2304,17 @@  static int files_reflog_iterator_peel(struct ref_iterator *ref_iterator UNUSED,
 	BUG("ref_iterator_peel() called for reflog_iterator");
 }
 
-static int files_reflog_iterator_abort(struct ref_iterator *ref_iterator)
+static void files_reflog_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct files_reflog_iterator *iter =
 		(struct files_reflog_iterator *)ref_iterator;
-	int ok = ITER_DONE;
-
-	if (iter->dir_iterator)
-		ok = dir_iterator_abort(iter->dir_iterator);
-
-	base_ref_iterator_free(ref_iterator);
-	return ok;
+	dir_iterator_free(iter->dir_iterator);
 }
 
 static struct ref_iterator_vtable files_reflog_iterator_vtable = {
 	.advance = files_reflog_iterator_advance,
 	.peel = files_reflog_iterator_peel,
-	.abort = files_reflog_iterator_abort,
+	.release = files_reflog_iterator_release,
 };
 
 static struct ref_iterator *reflog_iterator_begin(struct ref_store *ref_store,
@@ -3808,6 +3791,7 @@  static int files_fsck_refs_dir(struct ref_store *ref_store,
 		ret = error(_("failed to iterate over '%s'"), sb.buf);
 
 out:
+	dir_iterator_free(iter);
 	strbuf_release(&sb);
 	strbuf_release(&refname);
 	return ret;
diff --git a/refs/iterator.c b/refs/iterator.c
index d25e568bf0b..aaeff270437 100644
--- a/refs/iterator.c
+++ b/refs/iterator.c
@@ -21,9 +21,14 @@  int ref_iterator_peel(struct ref_iterator *ref_iterator,
 	return ref_iterator->vtable->peel(ref_iterator, peeled);
 }
 
-int ref_iterator_abort(struct ref_iterator *ref_iterator)
+void ref_iterator_free(struct ref_iterator *ref_iterator)
 {
-	return ref_iterator->vtable->abort(ref_iterator);
+	if (ref_iterator) {
+		ref_iterator->vtable->release(ref_iterator);
+		/* Help make use-after-free bugs fail quickly: */
+		ref_iterator->vtable = NULL;
+		free(ref_iterator);
+	}
 }
 
 void base_ref_iterator_init(struct ref_iterator *iter,
@@ -36,20 +41,13 @@  void base_ref_iterator_init(struct ref_iterator *iter,
 	iter->flags = 0;
 }
 
-void base_ref_iterator_free(struct ref_iterator *iter)
-{
-	/* Help make use-after-free bugs fail quickly: */
-	iter->vtable = NULL;
-	free(iter);
-}
-
 struct empty_ref_iterator {
 	struct ref_iterator base;
 };
 
-static int empty_ref_iterator_advance(struct ref_iterator *ref_iterator)
+static int empty_ref_iterator_advance(struct ref_iterator *ref_iterator UNUSED)
 {
-	return ref_iterator_abort(ref_iterator);
+	return ITER_DONE;
 }
 
 static int empty_ref_iterator_peel(struct ref_iterator *ref_iterator UNUSED,
@@ -58,16 +56,14 @@  static int empty_ref_iterator_peel(struct ref_iterator *ref_iterator UNUSED,
 	BUG("peel called for empty iterator");
 }
 
-static int empty_ref_iterator_abort(struct ref_iterator *ref_iterator)
+static void empty_ref_iterator_release(struct ref_iterator *ref_iterator UNUSED)
 {
-	base_ref_iterator_free(ref_iterator);
-	return ITER_DONE;
 }
 
 static struct ref_iterator_vtable empty_ref_iterator_vtable = {
 	.advance = empty_ref_iterator_advance,
 	.peel = empty_ref_iterator_peel,
-	.abort = empty_ref_iterator_abort,
+	.release = empty_ref_iterator_release,
 };
 
 struct ref_iterator *empty_ref_iterator_begin(void)
@@ -151,11 +147,13 @@  static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	if (!iter->current) {
 		/* Initialize: advance both iterators to their first entries */
 		if ((ok = ref_iterator_advance(iter->iter0)) != ITER_OK) {
+			ref_iterator_free(iter->iter0);
 			iter->iter0 = NULL;
 			if (ok == ITER_ERROR)
 				goto error;
 		}
 		if ((ok = ref_iterator_advance(iter->iter1)) != ITER_OK) {
+			ref_iterator_free(iter->iter1);
 			iter->iter1 = NULL;
 			if (ok == ITER_ERROR)
 				goto error;
@@ -166,6 +164,7 @@  static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator)
 		 * entry:
 		 */
 		if ((ok = ref_iterator_advance(*iter->current)) != ITER_OK) {
+			ref_iterator_free(*iter->current);
 			*iter->current = NULL;
 			if (ok == ITER_ERROR)
 				goto error;
@@ -179,9 +178,8 @@  static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator)
 			iter->select(iter->iter0, iter->iter1, iter->cb_data);
 
 		if (selection == ITER_SELECT_DONE) {
-			return ref_iterator_abort(ref_iterator);
+			return ITER_DONE;
 		} else if (selection == ITER_SELECT_ERROR) {
-			ref_iterator_abort(ref_iterator);
 			return ITER_ERROR;
 		}
 
@@ -195,6 +193,7 @@  static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator)
 
 		if (selection & ITER_SKIP_SECONDARY) {
 			if ((ok = ref_iterator_advance(*secondary)) != ITER_OK) {
+				ref_iterator_free(*secondary);
 				*secondary = NULL;
 				if (ok == ITER_ERROR)
 					goto error;
@@ -211,7 +210,6 @@  static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	}
 
 error:
-	ref_iterator_abort(ref_iterator);
 	return ITER_ERROR;
 }
 
@@ -227,28 +225,18 @@  static int merge_ref_iterator_peel(struct ref_iterator *ref_iterator,
 	return ref_iterator_peel(*iter->current, peeled);
 }
 
-static int merge_ref_iterator_abort(struct ref_iterator *ref_iterator)
+static void merge_ref_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct merge_ref_iterator *iter =
 		(struct merge_ref_iterator *)ref_iterator;
-	int ok = ITER_DONE;
-
-	if (iter->iter0) {
-		if (ref_iterator_abort(iter->iter0) != ITER_DONE)
-			ok = ITER_ERROR;
-	}
-	if (iter->iter1) {
-		if (ref_iterator_abort(iter->iter1) != ITER_DONE)
-			ok = ITER_ERROR;
-	}
-	base_ref_iterator_free(ref_iterator);
-	return ok;
+	ref_iterator_free(iter->iter0);
+	ref_iterator_free(iter->iter1);
 }
 
 static struct ref_iterator_vtable merge_ref_iterator_vtable = {
 	.advance = merge_ref_iterator_advance,
 	.peel = merge_ref_iterator_peel,
-	.abort = merge_ref_iterator_abort,
+	.release = merge_ref_iterator_release,
 };
 
 struct ref_iterator *merge_ref_iterator_begin(
@@ -310,10 +298,10 @@  struct ref_iterator *overlay_ref_iterator_begin(
 	 * them.
 	 */
 	if (is_empty_ref_iterator(front)) {
-		ref_iterator_abort(front);
+		ref_iterator_free(front);
 		return back;
 	} else if (is_empty_ref_iterator(back)) {
-		ref_iterator_abort(back);
+		ref_iterator_free(back);
 		return front;
 	}
 
@@ -350,19 +338,10 @@  static int prefix_ref_iterator_advance(struct ref_iterator *ref_iterator)
 
 	while ((ok = ref_iterator_advance(iter->iter0)) == ITER_OK) {
 		int cmp = compare_prefix(iter->iter0->refname, iter->prefix);
-
 		if (cmp < 0)
 			continue;
-
-		if (cmp > 0) {
-			/*
-			 * As the source iterator is ordered, we
-			 * can stop the iteration as soon as we see a
-			 * refname that comes after the prefix:
-			 */
-			ok = ref_iterator_abort(iter->iter0);
-			break;
-		}
+		if (cmp > 0)
+			return ITER_DONE;
 
 		if (iter->trim) {
 			/*
@@ -386,9 +365,6 @@  static int prefix_ref_iterator_advance(struct ref_iterator *ref_iterator)
 		return ITER_OK;
 	}
 
-	iter->iter0 = NULL;
-	if (ref_iterator_abort(ref_iterator) != ITER_DONE)
-		return ITER_ERROR;
 	return ok;
 }
 
@@ -401,23 +377,18 @@  static int prefix_ref_iterator_peel(struct ref_iterator *ref_iterator,
 	return ref_iterator_peel(iter->iter0, peeled);
 }
 
-static int prefix_ref_iterator_abort(struct ref_iterator *ref_iterator)
+static void prefix_ref_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct prefix_ref_iterator *iter =
 		(struct prefix_ref_iterator *)ref_iterator;
-	int ok = ITER_DONE;
-
-	if (iter->iter0)
-		ok = ref_iterator_abort(iter->iter0);
+	ref_iterator_free(iter->iter0);
 	free(iter->prefix);
-	base_ref_iterator_free(ref_iterator);
-	return ok;
 }
 
 static struct ref_iterator_vtable prefix_ref_iterator_vtable = {
 	.advance = prefix_ref_iterator_advance,
 	.peel = prefix_ref_iterator_peel,
-	.abort = prefix_ref_iterator_abort,
+	.release = prefix_ref_iterator_release,
 };
 
 struct ref_iterator *prefix_ref_iterator_begin(struct ref_iterator *iter0,
@@ -453,20 +424,14 @@  int do_for_each_ref_iterator(struct ref_iterator *iter,
 	current_ref_iter = iter;
 	while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
 		retval = fn(iter->refname, iter->referent, iter->oid, iter->flags, cb_data);
-		if (retval) {
-			/*
-			 * If ref_iterator_abort() returns ITER_ERROR,
-			 * we ignore that error in deference to the
-			 * callback function's return value.
-			 */
-			ref_iterator_abort(iter);
+		if (retval)
 			goto out;
-		}
 	}
 
 out:
 	current_ref_iter = old_ref_iter;
 	if (ok == ITER_ERROR)
-		return -1;
+		retval = -1;
+	ref_iterator_free(iter);
 	return retval;
 }
diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index a7b6f74b6e3..38a1956d1a8 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -954,9 +954,6 @@  static int packed_ref_iterator_advance(struct ref_iterator *ref_iterator)
 		return ITER_OK;
 	}
 
-	if (ref_iterator_abort(ref_iterator) != ITER_DONE)
-		ok = ITER_ERROR;
-
 	return ok;
 }
 
@@ -976,23 +973,19 @@  static int packed_ref_iterator_peel(struct ref_iterator *ref_iterator,
 	}
 }
 
-static int packed_ref_iterator_abort(struct ref_iterator *ref_iterator)
+static void packed_ref_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct packed_ref_iterator *iter =
 		(struct packed_ref_iterator *)ref_iterator;
-	int ok = ITER_DONE;
-
 	strbuf_release(&iter->refname_buf);
 	free(iter->jump);
 	release_snapshot(iter->snapshot);
-	base_ref_iterator_free(ref_iterator);
-	return ok;
 }
 
 static struct ref_iterator_vtable packed_ref_iterator_vtable = {
 	.advance = packed_ref_iterator_advance,
 	.peel = packed_ref_iterator_peel,
-	.abort = packed_ref_iterator_abort
+	.release = packed_ref_iterator_release,
 };
 
 static int jump_list_entry_cmp(const void *va, const void *vb)
@@ -1362,8 +1355,10 @@  static int write_with_updates(struct packed_ref_store *refs,
 	 */
 	iter = packed_ref_iterator_begin(&refs->base, "", NULL,
 					 DO_FOR_EACH_INCLUDE_BROKEN);
-	if ((ok = ref_iterator_advance(iter)) != ITER_OK)
+	if ((ok = ref_iterator_advance(iter)) != ITER_OK) {
+		ref_iterator_free(iter);
 		iter = NULL;
+	}
 
 	i = 0;
 
@@ -1411,8 +1406,10 @@  static int write_with_updates(struct packed_ref_store *refs,
 				 * the iterator over the unneeded
 				 * value.
 				 */
-				if ((ok = ref_iterator_advance(iter)) != ITER_OK)
+				if ((ok = ref_iterator_advance(iter)) != ITER_OK) {
+					ref_iterator_free(iter);
 					iter = NULL;
+				}
 				cmp = +1;
 			} else {
 				/*
@@ -1449,8 +1446,10 @@  static int write_with_updates(struct packed_ref_store *refs,
 					       peel_error ? NULL : &peeled))
 				goto write_error;
 
-			if ((ok = ref_iterator_advance(iter)) != ITER_OK)
+			if ((ok = ref_iterator_advance(iter)) != ITER_OK) {
+				ref_iterator_free(iter);
 				iter = NULL;
+			}
 		} else if (is_null_oid(&update->new_oid)) {
 			/*
 			 * The update wants to delete the reference,
@@ -1499,9 +1498,7 @@  static int write_with_updates(struct packed_ref_store *refs,
 		    get_tempfile_path(refs->tempfile), strerror(errno));
 
 error:
-	if (iter)
-		ref_iterator_abort(iter);
-
+	ref_iterator_free(iter);
 	delete_tempfile(&refs->tempfile);
 	return -1;
 }
diff --git a/refs/ref-cache.c b/refs/ref-cache.c
index 02f09e4df88..6457e02c1ea 100644
--- a/refs/ref-cache.c
+++ b/refs/ref-cache.c
@@ -409,7 +409,7 @@  static int cache_ref_iterator_advance(struct ref_iterator *ref_iterator)
 		if (++level->index == level->dir->nr) {
 			/* This level is exhausted; pop up a level */
 			if (--iter->levels_nr == 0)
-				return ref_iterator_abort(ref_iterator);
+				return ITER_DONE;
 
 			continue;
 		}
@@ -452,21 +452,18 @@  static int cache_ref_iterator_peel(struct ref_iterator *ref_iterator,
 	return peel_object(iter->repo, ref_iterator->oid, peeled) ? -1 : 0;
 }
 
-static int cache_ref_iterator_abort(struct ref_iterator *ref_iterator)
+static void cache_ref_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct cache_ref_iterator *iter =
 		(struct cache_ref_iterator *)ref_iterator;
-
 	free((char *)iter->prefix);
 	free(iter->levels);
-	base_ref_iterator_free(ref_iterator);
-	return ITER_DONE;
 }
 
 static struct ref_iterator_vtable cache_ref_iterator_vtable = {
 	.advance = cache_ref_iterator_advance,
 	.peel = cache_ref_iterator_peel,
-	.abort = cache_ref_iterator_abort
+	.release = cache_ref_iterator_release,
 };
 
 struct ref_iterator *cache_ref_iterator_begin(struct ref_cache *cache,
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index aaab711bb96..27ff822cf43 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -273,11 +273,11 @@  enum do_for_each_ref_flags {
  * the next reference and returns ITER_OK. The data pointed at by
  * refname and oid belong to the iterator; if you want to retain them
  * after calling ref_iterator_advance() again or calling
- * ref_iterator_abort(), you must make a copy. When the iteration has
+ * ref_iterator_free(), you must make a copy. When the iteration has
  * been exhausted, ref_iterator_advance() releases any resources
  * associated with the iteration, frees the ref_iterator object, and
  * returns ITER_DONE. If you want to abort the iteration early, call
- * ref_iterator_abort(), which also frees the ref_iterator object and
+ * ref_iterator_free(), which also frees the ref_iterator object and
  * any associated resources. If there was an internal error advancing
  * to the next entry, ref_iterator_advance() aborts the iteration,
  * frees the ref_iterator, and returns ITER_ERROR.
@@ -292,10 +292,8 @@  enum do_for_each_ref_flags {
  *     struct ref_iterator *iter = ...;
  *
  *     while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
- *             if (want_to_stop_iteration()) {
- *                     ok = ref_iterator_abort(iter);
+ *             if (want_to_stop_iteration())
  *                     break;
- *             }
  *
  *             // Access information about the current reference:
  *             if (!(iter->flags & REF_ISSYMREF))
@@ -307,6 +305,7 @@  enum do_for_each_ref_flags {
  *
  *     if (ok != ITER_DONE)
  *             handle_error();
+ *     ref_iterator_free(iter);
  */
 struct ref_iterator {
 	struct ref_iterator_vtable *vtable;
@@ -333,12 +332,8 @@  int ref_iterator_advance(struct ref_iterator *ref_iterator);
 int ref_iterator_peel(struct ref_iterator *ref_iterator,
 		      struct object_id *peeled);
 
-/*
- * End the iteration before it has been exhausted, freeing the
- * reference iterator and any associated resources and returning
- * ITER_DONE. If the abort itself failed, return ITER_ERROR.
- */
-int ref_iterator_abort(struct ref_iterator *ref_iterator);
+/* Free the reference iterator and any associated resources. */
+void ref_iterator_free(struct ref_iterator *ref_iterator);
 
 /*
  * An iterator over nothing (its first ref_iterator_advance() call
@@ -438,13 +433,6 @@  struct ref_iterator *prefix_ref_iterator_begin(struct ref_iterator *iter0,
 void base_ref_iterator_init(struct ref_iterator *iter,
 			    struct ref_iterator_vtable *vtable);
 
-/*
- * Base class destructor for ref_iterators. Destroy the ref_iterator
- * part of iter and shallow-free the object. This is meant to be
- * called only by the destructors of derived classes.
- */
-void base_ref_iterator_free(struct ref_iterator *iter);
-
 /* Virtual function declarations for ref_iterators: */
 
 /*
@@ -463,15 +451,14 @@  typedef int ref_iterator_peel_fn(struct ref_iterator *ref_iterator,
 
 /*
  * Implementations of this function should free any resources specific
- * to the derived class, then call base_ref_iterator_free() to clean
- * up and free the ref_iterator object.
+ * to the derived class.
  */
-typedef int ref_iterator_abort_fn(struct ref_iterator *ref_iterator);
+typedef void ref_iterator_release_fn(struct ref_iterator *ref_iterator);
 
 struct ref_iterator_vtable {
 	ref_iterator_advance_fn *advance;
 	ref_iterator_peel_fn *peel;
-	ref_iterator_abort_fn *abort;
+	ref_iterator_release_fn *release;
 };
 
 /*
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 2a90e7cb391..06543f79c64 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -711,17 +711,10 @@  static int reftable_ref_iterator_advance(struct ref_iterator *ref_iterator)
 		break;
 	}
 
-	if (iter->err > 0) {
-		if (ref_iterator_abort(ref_iterator) != ITER_DONE)
-			return ITER_ERROR;
+	if (iter->err > 0)
 		return ITER_DONE;
-	}
-
-	if (iter->err < 0) {
-		ref_iterator_abort(ref_iterator);
+	if (iter->err < 0)
 		return ITER_ERROR;
-	}
-
 	return ITER_OK;
 }
 
@@ -740,7 +733,7 @@  static int reftable_ref_iterator_peel(struct ref_iterator *ref_iterator,
 	return -1;
 }
 
-static int reftable_ref_iterator_abort(struct ref_iterator *ref_iterator)
+static void reftable_ref_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct reftable_ref_iterator *iter =
 		(struct reftable_ref_iterator *)ref_iterator;
@@ -751,14 +744,12 @@  static int reftable_ref_iterator_abort(struct ref_iterator *ref_iterator)
 			free(iter->exclude_patterns[i]);
 		free(iter->exclude_patterns);
 	}
-	free(iter);
-	return ITER_DONE;
 }
 
 static struct ref_iterator_vtable reftable_ref_iterator_vtable = {
 	.advance = reftable_ref_iterator_advance,
 	.peel = reftable_ref_iterator_peel,
-	.abort = reftable_ref_iterator_abort
+	.release = reftable_ref_iterator_release,
 };
 
 static int qsort_strcmp(const void *va, const void *vb)
@@ -2017,17 +2008,10 @@  static int reftable_reflog_iterator_advance(struct ref_iterator *ref_iterator)
 		break;
 	}
 
-	if (iter->err > 0) {
-		if (ref_iterator_abort(ref_iterator) != ITER_DONE)
-			return ITER_ERROR;
+	if (iter->err > 0)
 		return ITER_DONE;
-	}
-
-	if (iter->err < 0) {
-		ref_iterator_abort(ref_iterator);
+	if (iter->err < 0)
 		return ITER_ERROR;
-	}
-
 	return ITER_OK;
 }
 
@@ -2038,21 +2022,19 @@  static int reftable_reflog_iterator_peel(struct ref_iterator *ref_iterator UNUSE
 	return -1;
 }
 
-static int reftable_reflog_iterator_abort(struct ref_iterator *ref_iterator)
+static void reftable_reflog_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct reftable_reflog_iterator *iter =
 		(struct reftable_reflog_iterator *)ref_iterator;
 	reftable_log_record_release(&iter->log);
 	reftable_iterator_destroy(&iter->iter);
 	strbuf_release(&iter->last_name);
-	free(iter);
-	return ITER_DONE;
 }
 
 static struct ref_iterator_vtable reftable_reflog_iterator_vtable = {
 	.advance = reftable_reflog_iterator_advance,
 	.peel = reftable_reflog_iterator_peel,
-	.abort = reftable_reflog_iterator_abort
+	.release = reftable_reflog_iterator_release,
 };
 
 static struct reftable_reflog_iterator *reflog_iterator_for_stack(struct reftable_ref_store *refs,
diff --git a/t/helper/test-dir-iterator.c b/t/helper/test-dir-iterator.c
index 6b297bd7536..8d46e8ba409 100644
--- a/t/helper/test-dir-iterator.c
+++ b/t/helper/test-dir-iterator.c
@@ -53,6 +53,7 @@  int cmd__dir_iterator(int argc, const char **argv)
 		printf("(%s) [%s] %s\n", diter->relative_path, diter->basename,
 		       diter->path.buf);
 	}
+	dir_iterator_free(diter);
 
 	if (iter_status != ITER_DONE) {
 		printf("dir_iterator_advance failure\n");