[v3,1/3] ref-filter: add worktreepath atom

Message ID	20181216215759.24011-2-nbelakovski@gmail.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <git-owner@kernel.org> From: nbelakovski@gmail.com To: git@vger.kernel.org Cc: peff@peff.net, rafa.almas@gmail.com, gitster@pobox.com, avarab@gmail.com, Nickolai Belakovski <nbelakovski@gmail.com> Subject: [PATCH v3 1/3] ref-filter: add worktreepath atom Date: Sun, 16 Dec 2018 13:57:57 -0800 Message-Id: <20181216215759.24011-2-nbelakovski@gmail.com> In-Reply-To: <20181216215759.24011-1-nbelakovski@gmail.com> References: <20181111235831.44824-1-nbelakovski@gmail.com> <20181216215759.24011-1-nbelakovski@gmail.com> Sender: git-owner@vger.kernel.org Precedence: bulk
Series	[v3,1/3] ref-filter: add worktreepath atom \| expand [v3,0/3] [v3,1/3] ref-filter: add worktreepath atom [v3,2/3] branch: Mark and color a branch differently if it is checked out in a linked worktree [v3,3/3] branch: Add an extra verbose output displaying worktree path for refs checked out in a lin…

Nickolai Belakovski Dec. 16, 2018, 9:57 p.m. UTC

From: Nickolai Belakovski <nbelakovski@gmail.com>

Add an atom proving the path of the linked worktree where this ref is
checked out, if it is checked out in any linked worktrees, and empty
string otherwise.
---
 Documentation/git-for-each-ref.txt |  4 +++
 ref-filter.c                       | 70 ++++++++++++++++++++++++++++++++++++++
 t/t6302-for-each-ref-filter.sh     | 15 ++++++++
 3 files changed, 89 insertions(+)

Jeff King Dec. 18, 2018, 5:22 p.m. UTC | #1

On Sun, Dec 16, 2018 at 01:57:57PM -0800, nbelakovski@gmail.com wrote:

> From: Nickolai Belakovski <nbelakovski@gmail.com>
> 
> Add an atom proving the path of the linked worktree where this ref is
> checked out, if it is checked out in any linked worktrees, and empty
> string otherwise.

I stumbled over the word "proving" here. Maybe "showing" would be more
clear?

> diff --git a/Documentation/git-for-each-ref.txt b/Documentation/git-for-each-ref.txt
> index 901faef1bf..9590f7beab 100644
> --- a/Documentation/git-for-each-ref.txt
> +++ b/Documentation/git-for-each-ref.txt
> @@ -209,6 +209,10 @@ symref::
>  	`:lstrip` and `:rstrip` options in the same way as `refname`
>  	above.
>  
> +worktreepath::
> +	The absolute path to the worktree in which the ref is checked
> +	out, if it is checked out in any linked worktree. ' ' otherwise.
> +

Normally single-quotes are used in asciidoc to emphasize text, and the
quotes aren't passed through. Asciidoc (and asciidoctor) do seem to
render the literal quotes here, which is good. I wonder if it would be
more clear to just write it out, though, like:

  ...any linked worktree. Otherwise, replaced with a single space.

Also, why are we replacing it with a single space? Wouldn't the empty
string be more customary (and work with the other "if empty, then do
this" formatting options)?

> @@ -34,6 +36,8 @@ static struct ref_msg {
>  	"ahead %d, behind %d"
>  };
>  
> +static struct worktree ** worktrees;

Minor style nit: we put the "*" in a pointer declaration next to the
variable name, without intervening whitespace. Like:

  static struct worktree **worktrees;

> @@ -75,6 +79,12 @@ static struct expand_data {
>  	struct object_info info;
>  } oi, oi_deref;
>  
> +struct reftoworktreeinfo_entry {
> +    struct hashmap_entry ent; // must be the first member!
> +    char * ref; // key into map
> +    struct worktree * wt;
> +};

A few style nits:

  - the "*" space thing from above (it's in other places below, too, but
    I won't point out each)

  - we prefer "/* */" comments, even for single-liners

  - since we do all-lowercase identifiers, use more underscores to break
    things up. E.g., ref_to_worktree_entry.

Here we store the refname as a separate variable, but then point to the
worktree itself to access wt->path. Why do we treat these differently?
I.e., I'd expect to see either:

  1. Each entry holding a single worktree object, and using its head_ref
     and path fields, like:

       struct ref_to_worktree_entry {
               struct hashmap_entry ent; /* must be first */
	       struct worktree *wt;
       };
       ....
       entry = xmalloc(sizeof(*entry));
       entry->wt = wt;
       hashmap_entry_init(entry, strhash(wt->head_ref));
       ...
       strbuf_addstr(&out, result->wt->path);

  2. Each entry containing just the bits it needs, like:

       struct ref_to_worktree_entry {
               struct hashmap_entry ent; /* must be first */
	       char *ref;
	       char *path;
       };
       ...
       /*
        * We could use FLEXPTR_ALLOC_STR() here, but it doesn't actually
	* support holding _two_ strings. Separate allocations probably
	* aren't a huge deal here, since there are only a handful of
	* worktrees.
	*/
       entry = xmalloc(sizeof(*entry));
       entry->ref = wt->head_ref;
       entry->path = wt->path;
       hashmap_entry_init(entry, strhash(entry->ref));
       ...
       strbuf_addstr(&out, result->path);

I think the first one is strictly preferable unless we're worried about
the lifetime of the "struct worktree" going away. I don't think that's
an issue, though; they are ours until we call free_worktrees().

> @@ -114,6 +124,7 @@ static struct used_atom {
>  		} objectname;
>  		struct refname_atom refname;
>  		char *head;
> +		struct hashmap reftoworktreeinfo_map;
>  	} u;
>  } *used_atom;

This uses one map for each %(worktree) we use. But won't they all be the
same? It would ideally be associated with the ref-filter.  There's no
ref-filter context struct to hold this kind of data, just static globals
in ref-filter.c (including this used_atom struct!). That's something
we'll probably need to fix in the long run, but I think it would be
reasonable to just have:

  static struct hashmap ref_to_worktree_map;

next to the declaration of used_atom_cnt, need_symref, etc. And then
those can all eventually get moved into a struct together.

> @@ -461,6 +497,7 @@ static struct {
>  	{ "flag", SOURCE_NONE },
>  	{ "HEAD", SOURCE_NONE, FIELD_STR, head_atom_parser },
>  	{ "color", SOURCE_NONE, FIELD_STR, color_atom_parser },
> +	{ "worktreepath", SOURCE_NONE, FIELD_STR, worktree_atom_parser },
>  	{ "align", SOURCE_NONE, FIELD_STR, align_atom_parser },
>  	{ "end", SOURCE_NONE },
>  	{ "if", SOURCE_NONE, FIELD_STR, if_atom_parser },

Marking as SOURCE_NONE makes sense.

> +static const char * get_worktree_info(const struct used_atom *atom, const struct ref_array_item *ref)
> +{
> +	struct strbuf val = STRBUF_INIT;
> +	struct reftoworktreeinfo_entry * entry;
> +	struct reftoworktreeinfo_entry * lookup_result;
> +
> +	FLEXPTR_ALLOC_STR(entry, ref, ref->refname);
> +	hashmap_entry_init(entry, strhash(entry->ref));
> +	lookup_result = hashmap_get(&(atom->u.reftoworktreeinfo_map), entry, NULL);
> +	free(entry);

We shouldn't need to do an allocation just for a lookup. That's what the
extra "keydata" parameter is for in the comparison function. And I guess
this is what led you to have "char *ref" in the struct, rather than
reusing wt->head_ref (because you don't have a "struct worktree" here).

You should be able to do it like this:

  struct hashmap_entry entry;
  struct ref_to_worktree_entry *result;

  hashmap_entry_init(entry, strhash(ref->refname));
  result = hashmap_get(&ref_to_worktree_map, &entry, ref->refname));
  ...

and then your comparison function would look like this:

  int ref_to_worktree_hashcmp(const void *data,
                              const void *entry,
			      const void *entry_or_key,
			      const void *keydata)
  {
          const struct ref_to_worktree_entry *a = entry;
          const struct ref_to_worktree_entry *b = entry;
	  if (keydata)
	          return strcmp(a->wt->head_ref, keydata);
	  else
	          return strcmp(a->wt->head_ref, b->wt->head_ref);
  }

If you're thinking that this API is totally confusing and hard to figure
out, I agree. It's optimized to avoid extra allocations. I wish we had a
better one for simple cases (especially string->string mappings like
this).

Speaking of comparison functions, I didn't see one in your patch. Don't
you need to pass one to hashmap_init?

> +	if (lookup_result)
> +	{
> +		if (!strncmp(atom->name, "worktreepath", strlen(atom->name)))
> +			strbuf_addstr(&val, lookup_result->wt->path);
> +	}
> +	else
> +		strbuf_addstr(&val, " ");

What's this extra strncmp about? If we're _not_ a worktreepath atom,
we'd still do the lookup only to put nothing in the string?

I think we'd only call this function when populate_value() sees a
worktreepath atom, though:

> @@ -1537,6 +1596,10 @@ static int populate_value(struct ref_array_item *ref, struct strbuf *err)
>  
>  		if (starts_with(name, "refname"))
>  			refname = get_refname(atom, ref);
> +		else if (starts_with(name, "worktreepath")) {
> +			v->s = get_worktree_info(atom, ref);
> +			continue;
> +		}

So it would be OK to drop the check of atom->name again inside
get_worktree_info().

> @@ -2013,7 +2076,14 @@ void ref_array_clear(struct ref_array *array)
>  	int i;
>  
>  	for (i = 0; i < used_atom_cnt; i++)
> +	{
> +		if (!strncmp(used_atom[i].name, "worktreepath", strlen("worktreepath")))
> +		{
> +			hashmap_free(&(used_atom[i].u.reftoworktreeinfo_map), 1);
> +			free_worktrees(worktrees);
> +		}

And if we move the mapping out to a static global, then this only has to
be done once, not once per atom. In fact, I think this could double-free
"worktrees" with your current patch if you have two "%(worktree)"
placeholders, since "worktrees" already is a global.

> diff --git a/t/t6302-for-each-ref-filter.sh b/t/t6302-for-each-ref-filter.sh
> index fc067ed672..add70a4c3e 100755
> --- a/t/t6302-for-each-ref-filter.sh
> +++ b/t/t6302-for-each-ref-filter.sh
> @@ -441,4 +441,19 @@ test_expect_success '--merged is incompatible with --no-merged' '
>  	test_must_fail git for-each-ref --merged HEAD --no-merged HEAD
>  '
>  
> +test_expect_success '"add" a worktree' '
> +	mkdir worktree_dir &&
> +	git worktree add -b master_worktree worktree_dir master
> +'
> +
> +test_expect_success 'validate worktree atom' '
> +	cat >expect <<-\EOF &&
> +	master: checked out in a worktree
> +	master_worktree: checked out in a worktree
> +	side: not checked out in a worktree
> +	EOF
> +	git for-each-ref --format="%(refname:short): %(if)%(worktreepath)%(then)checked out in a worktree%(else)not checked out in a worktree%(end)" refs/heads/ >actual &&
> +	test_cmp expect actual
> +'

It's probably worth testing that the path we get is actually sane, too.
I.e., expect something more like:

   cat >expect <<-\EOF
   master: $PWD
   master: $PWD/worktree
   side: not checked out
   EOF
   git for-each-ref \
     --format="%(refname:short): %(if)%(worktreepath)%(then)%(worktreepath)%(else)not checked %out%(end)

(I wish there was a way to avoid that really long line, but I don't
think there is).

-Peff

Nickolai Belakovski Dec. 20, 2018, 7:09 a.m. UTC | #2

On Tue, Dec 18, 2018 at 9:22 AM Jeff King <peff@peff.net> wrote:
>
> On Sun, Dec 16, 2018 at 01:57:57PM -0800, nbelakovski@gmail.com wrote:
>
> > From: Nickolai Belakovski <nbelakovski@gmail.com>
> >
> > Add an atom proving the path of the linked worktree where this ref is
> > checked out, if it is checked out in any linked worktrees, and empty
> > string otherwise.
>
> I stumbled over the word "proving" here. Maybe "showing" would be more
> clear?

Oops, providing

> > +worktreepath::
> > +     The absolute path to the worktree in which the ref is checked
> > +     out, if it is checked out in any linked worktree. ' ' otherwise.
> > +
>
> Also, why are we replacing it with a single space? Wouldn't the empty
> string be more customary (and work with the other "if empty, then do
> this" formatting options)?

I was just following what was done for HEAD, but overall I agree that
empty is preferable to single space, will change.

> Minor style nit: we put the "*" in a pointer declaration next to the
> variable name, without intervening whitespace. Like:
>
>   static struct worktree **worktrees;

Gotcha, will do, thanks for pointing it out.


To sum up the hashmap comments:
-I hadn't thought to re-use the head_ref of worktree as the key.
That's clever. I like the readability of having separate entries for
key and value, but I can see the benefit of not having to do an extra
allocation. I can make up for the readability hit with a comment.
-Actually, for any valid use case there will only be one instance of
the map since the entries of used_atom are cached, but regardless it
makes sense to keep per-atom info in used_atom and global context
somewhere else, so I'll make that change to make it a static variable
outside of used_atom.
-Will change the lookup logic to remove the extra allocation. Since
I'm letting the hashmap use its internal comparison function on the
hash, I don't need to provide a comparison function.

> What's this extra strncmp about? If we're _not_ a worktreepath atom,
> we'd still do the lookup only to put nothing in the string?

Leftover from an earlier iteration where I was going to support
getting more info out of the worktree struct. I decided to limit scope
to just the info I really needed for the branch change. I left it like
this because I thought it would make the code more readable for
someone who wanted to come in and add that extra info, but I think
you're right that it ends up just reading kind of awkwardly.

>
> > @@ -2013,7 +2076,14 @@ void ref_array_clear(struct ref_array *array)
> >       int i;
> >
> >       for (i = 0; i < used_atom_cnt; i++)
> > +     {
> > +             if (!strncmp(used_atom[i].name, "worktreepath", strlen("worktreepath")))
> > +             {
> > +                     hashmap_free(&(used_atom[i].u.reftoworktreeinfo_map), 1);
> > +                     free_worktrees(worktrees);
> > +             }
>
> And if we move the mapping out to a static global, then this only has to
> be done once, not once per atom. In fact, I think this could double-free
> "worktrees" with your current patch if you have two "%(worktree)"
> placeholders, since "worktrees" already is a global.

Only if someone put a colon on one of the %(worktree) atoms, otherwise
they're all cached, but as you say moot point anyway if the map is
moved outside the used_atom structure.

>
> It's probably worth testing that the path we get is actually sane, too.
> I.e., expect something more like:
>
>    cat >expect <<-\EOF
>    master: $PWD
>    master: $PWD/worktree
>    side: not checked out
>    EOF
>    git for-each-ref \
>      --format="%(refname:short): %(if)%(worktreepath)%(then)%(worktreepath)%(else)not checked %out%(end)
>
> (I wish there was a way to avoid that really long line, but I don't
> think there is).
>

Yea good call, can do.

Thanks for all the feedback, will try to turn these around quickly.

Jeff King Dec. 20, 2018, 2:59 p.m. UTC | #3

On Wed, Dec 19, 2018 at 11:09:59PM -0800, Nickolai Belakovski wrote:

> > Also, why are we replacing it with a single space? Wouldn't the empty
> > string be more customary (and work with the other "if empty, then do
> > this" formatting options)?
> 
> I was just following what was done for HEAD, but overall I agree that
> empty is preferable to single space, will change.

Ah, right, that makes more sense. I do still think for %(HEAD) it's a
little different because it is "+" or a single space, so always one
character.  Here we have some value or not, and in the "not" case for
such things we usually give an empty string (e.g., for %(push),
%(upstream), etc).

> To sum up the hashmap comments:
> -I hadn't thought to re-use the head_ref of worktree as the key.
> That's clever. I like the readability of having separate entries for
> key and value, but I can see the benefit of not having to do an extra
> allocation. I can make up for the readability hit with a comment.

Thanks, that makes sense.

> -Actually, for any valid use case there will only be one instance of
> the map since the entries of used_atom are cached, but regardless it
> makes sense to keep per-atom info in used_atom and global context
> somewhere else, so I'll make that change to make it a static variable
> outside of used_atom.

Ah, right, I forgot there was some magic around used_atom. I do still
agree that the separate static global makes things a little simpler.

> -Will change the lookup logic to remove the extra allocation. Since
> I'm letting the hashmap use its internal comparison function on the
> hash, I don't need to provide a comparison function.

I don't think that works. The default function is always_equal(), which
will treat two entries equal if they have the same hash value. I.e., any
collisions would be considered a match.

> Thanks for all the feedback, will try to turn these around quickly.

Great, thanks! I'll be on vacation for the next two weeks, so I may be
very slow to look at the next iteration. :)

-Peff

Nickolai Belakovski Dec. 24, 2018, 8:47 a.m. UTC | #4

From: Nickolai Belakovski <nbelakovski@gmail.com>

> I don't think that works. The default function is always_equal(), which
> will treat two entries equal if they have the same hash value. I.e., any
> collisions would be considered a match.

You're absolutely right. I've added a compare function, but I left out the
functionality for it to work with an entry passed in as a key. Doing so would
mean the user would have to allocate a worktree struct, which just seems silly
when the ref is all that's needed (and also defeats the purpose of avoiding
extra allocations).

And while most of the hashmap API seems OK, yea, this is definitely awful. It
feels like it should just be able to take a key and return either an entry or
NULL, and do away with entry_or_key and equals_function_data.

Travis-CI results: https://travis-ci.org/nbelakovski/git/builds/471787317

Nickolai Belakovski (3):
  ref-filter: add worktreepath atom
  branch: Mark and color a branch differently if it is checked out in a
    linked worktree
  branch: Add an extra verbose output displaying worktree path for refs
    checked out in a linked worktree

 Documentation/git-for-each-ref.txt |  5 +++
 builtin/branch.c                   | 16 ++++++---
 ref-filter.c                       | 72 +++++++++++++++++++++++++++++++++++++-
 t/t3200-branch.sh                  |  8 ++---
 t/t3203-branch-output.sh           | 21 +++++++++++
 t/t6302-for-each-ref-filter.sh     | 15 ++++++++
 6 files changed, 128 insertions(+), 9 deletions(-)

Jeff King Jan. 3, 2019, 5:22 a.m. UTC | #5

On Mon, Dec 24, 2018 at 12:47:53AM -0800, nbelakovski@gmail.com wrote:

> From: Nickolai Belakovski <nbelakovski@gmail.com>
> 
> > I don't think that works. The default function is always_equal(), which
> > will treat two entries equal if they have the same hash value. I.e., any
> > collisions would be considered a match.
> 
> You're absolutely right. I've added a compare function, but I left out the
> functionality for it to work with an entry passed in as a key. Doing so would
> mean the user would have to allocate a worktree struct, which just seems silly
> when the ref is all that's needed (and also defeats the purpose of avoiding
> extra allocations).

Unfortunately, that doesn't quite work. :)

Your compare function has to handle _both_ cases: two keys, or one key
and a keydata.

The former may be called when the hashmap has to compare two entries
internally (e.g., when it has to re-bucket all of the entries after a
resize). Your tests likely wouldn't run into this case in practice,
since you'd only have a handful of worktrees. But if you added, say,
thousands of worktrees and we had to grow the hash midway through the
process, it would segfault.  So you do need to handle the case when your
keydata is NULL.

In theory it would also be used for comparisons if we used a more clever
data structure to hold entries within a bucket. But since we just use a
linked list and linear search for now, we don't.

> And while most of the hashmap API seems OK, yea, this is definitely awful. It
> feels like it should just be able to take a key and return either an entry or
> NULL, and do away with entry_or_key and equals_function_data.

In your case, yeah, equals_function_data is not used at all (but there
are a few call-sites which need it to avoid relying on global data).

But the entry_or_key (coupled with keydata) is the magic that lets the
same function be used for both lookups as well as internal entry
comparisons.

I think having two separate comparison functions would make this a lot
more clear, though likely at the cost of having more boilerplate.

-Peff

[v3,1/3] ref-filter: add worktreepath atom

Commit Message

Comments

Patch