[0/2] Ensure unique worktree ids across repositories

Message ID	20241128-wt_unique_ids-v1-0-30345d010e43@pm.me (mailing list archive)
Headers	show Received: from mail-4316.protonmail.ch (mail-4316.protonmail.ch [185.70.43.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C8B6E79CD for <git@vger.kernel.org>; Fri, 29 Nov 2024 02:44:40 +0000 (UTC) Date: Fri, 29 Nov 2024 02:44:24 +0000 To: git@vger.kernel.org From: Caleb White <cdwhite3@pm.me> Cc: Caleb White <cdwhite3@pm.me> Subject: [PATCH 0/2] Ensure unique worktree ids across repositories Message-ID: <20241128-wt_unique_ids-v1-0-30345d010e43@pm.me> Feedback-ID: 31210263:user:proton Precedence: bulk MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable
Series	Ensure unique worktree ids across repositories \| expand [0/2] Ensure unique worktree ids across repositories [1/2] worktree: add worktree with unique suffix [2/2] worktree: rename worktree id during worktree move

Caleb White Nov. 29, 2024, 2:44 a.m. UTC

The `es/worktree-repair-copied` topic added support for repairing a
worktree from a copy scenario. I noted[1,2] that the topic added the
ability for a repository to "take over" a worktree from another
repository if the worktree_id matched a worktree inside the current
repository which can happen if two repositories use the same worktree name.

This series teaches Git to create worktrees with a unique suffix so
that the worktree_id is unique across all repositories even if they have
the same name. For example creating a worktree `develop` would look like:

    foo/
    ├── .git/worktrees/develop-5445874156/
    └── develop/
    bar/
    ├── .git/worktrees/develop-1549518426/
    └── develop/

The actual worktree directory name is still `develop`, but the
worktree_id is unique and prevents the "take over" scenario. The suffix
is given by the `git_rand()` function, but I'm open to suggestions if
there's a better random or hashing function to use.

[1]: https://lore.kernel.org/git/20241008153035.71178-1-cdwhite3@pm.me/
[2]: https://lore.kernel.org/git/r4zmcET41Skr_FMop47AKd7cms9E8bKPSvHuAUpnYavzKEY6JybJta0_7GfuYB0q-gD-XNcvh5VDTfiT3qthGKjqhS1sbT4M2lUABynOz2Q=@pm.me/

Signed-off-by: Caleb White <cdwhite3@pm.me>
---
Caleb White (2):
      worktree: add worktree with unique suffix
      worktree: rename worktree id during worktree move

 Documentation/git-worktree.txt     |  5 +-
 builtin/worktree.c                 | 30 ++++++++++++
 t/t0035-safe-bare-repository.sh    |  4 +-
 t/t0600-reffiles-backend.sh        | 10 ++--
 t/t0601-reffiles-pack-refs.sh      |  4 +-
 t/t0610-reftable-basics.sh         | 54 +++++++++++-----------
 t/t1407-worktree-ref-store.sh      |  4 +-
 t/t1410-reflog.sh                  | 10 ++--
 t/t1415-worktree-refs.sh           | 26 +++++------
 t/t1450-fsck.sh                    | 14 +++---
 t/t1500-rev-parse.sh               |  6 +--
 t/t2400-worktree-add.sh            | 51 +++++++++++----------
 t/t2401-worktree-prune.sh          | 20 ++++----
 t/t2403-worktree-move.sh           | 38 ++++++++--------
 t/t2405-worktree-submodule.sh      | 10 ++--
 t/t2406-worktree-repair.sh         | 93 ++++++++++++++++++++++++--------------
 t/t2407-worktree-heads.sh          | 27 +++++------
 t/t3200-branch.sh                  | 10 ++--
 t/t5304-prune.sh                   |  2 +-
 t/t7412-submodule-absorbgitdirs.sh |  4 +-
 20 files changed, 239 insertions(+), 183 deletions(-)
---
base-commit: 090d24e9af6e9f59c3f7bee97c42bb1ae3c7f559
change-id: 20241127-wt_unique_ids-1ffd7ea0bb19
prerequisite-change-id: 20241025-wt_relative_options-afa41987bc32:v5
prerequisite-patch-id: 179410e257e8eedf100f4f9faa9467cbbba4d61b
prerequisite-patch-id: 56ffe0afeadd511c9eef5f548a371659b040acab
prerequisite-patch-id: 809c1314e5dfa966f4f3d73b52f286f8aa89370f
prerequisite-patch-id: cf5f9491c8f8e58d1e0e103a5f8c64c55f2896e3
prerequisite-patch-id: 3d3bb3cc81d3030b1d27c39fdb4cf0e383937f89
prerequisite-patch-id: 62a09496d98d78a6bd1f9150ba887ee72359c7ee
prerequisite-patch-id: 5527e4b745963dd4fa08029491fcbfe3d91d5104
prerequisite-patch-id: bf433443e90939a493fa586de30938f78cb77020

Best regards,

Caleb White Nov. 29, 2024, 2:49 a.m. UTC | #1

On Thu Nov 28, 2024 at 8:44 PM CST, Caleb White wrote:
> The `es/worktree-repair-copied` topic added support for repairing a
> worktree from a copy scenario. I noted[1,2] that the topic added the
> ability for a repository to "take over" a worktree from another
> repository if the worktree_id matched a worktree inside the current
> repository which can happen if two repositories use the same worktree name.
>
> This series teaches Git to create worktrees with a unique suffix so
> that the worktree_id is unique across all repositories even if they have
> the same name. For example creating a worktree `develop` would look like:
>
>     foo/
>     ├── .git/worktrees/develop-5445874156/
>     └── develop/
>     bar/
>     ├── .git/worktrees/develop-1549518426/
>     └── develop/
>
> The actual worktree directory name is still `develop`, but the
> worktree_id is unique and prevents the "take over" scenario. The suffix
> is given by the `git_rand()` function, but I'm open to suggestions if
> there's a better random or hashing function to use.
>
> [1]: https://lore.kernel.org/git/20241008153035.71178-1-cdwhite3@pm.me/
> [2]: https://lore.kernel.org/git/r4zmcET41Skr_FMop47AKd7cms9E8bKPSvHuAUpnYavzKEY6JybJta0_7GfuYB0q-gD-XNcvh5VDTfiT3qthGKjqhS1sbT4M2lUABynOz2Q=@pm.me/

I forgot to mention, but the base for this series is obtained by merging
the `cw/worktree-extension` topic (2024-11-26, 20241125-wt_relative_options-v5-0-356d122ff3db@pm.me)
onto 090d24e9af.

Best,

Caleb

Junio C Hamano Nov. 29, 2024, 3:14 a.m. UTC | #2

Caleb White <cdwhite3@pm.me> writes:

> The `es/worktree-repair-copied` topic added support for repairing a
> worktree from a copy scenario. I noted[1,2] that the topic added the
> ability for a repository to "take over" a worktree from another
> repository if the worktree_id matched a worktree inside the current
> repository which can happen if two repositories use the same worktree name.

Problem worth solving.  Another would be to fail if the worktree ID
proposed to be used is already in use, but the ID is supposed to be
almost invisible (unless the user is doing some adiministrative work
on the repository), generating a unique ID is a good approach.

> This series teaches Git to create worktrees with a unique suffix so
> that the worktree_id is unique across all repositories even if they have
> the same name. For example creating a worktree `develop` would look like:
>
>     foo/
>     ├── .git/worktrees/develop-5445874156/
>     └── develop/
>     bar/
>     ├── .git/worktrees/develop-1549518426/
>     └── develop/
>
> The actual worktree directory name is still `develop`, but the
> worktree_id is unique and prevents the "take over" scenario. The suffix
> is given by the `git_rand()` function, but I'm open to suggestions if
> there's a better random or hashing function to use.

I do not think it matters much what hash/rand algorithm is chosen.
What is important is what you do when the suffix suggested by that
chosen algorithm collides with an existing worktree ID.  IOW, there
is no way a "random" can guarantee uniqueness.  Attempt to create and
if you find a collision, retry from the generation of another suffix,
or something like that, is necessary.

And as long as that "make sure it is unique" part is done right, it
does not even have to be random.  Just generating a sequence number
and using the first one that is available would work as well.

Thanks.

Caleb White Nov. 29, 2024, 3:31 a.m. UTC | #3

On Thu Nov 28, 2024 at 9:14 PM CST, Junio C Hamano wrote:
> Caleb White <cdwhite3@pm.me> writes:
>
>> The `es/worktree-repair-copied` topic added support for repairing a
>> worktree from a copy scenario. I noted[1,2] that the topic added the
>> ability for a repository to "take over" a worktree from another
>> repository if the worktree_id matched a worktree inside the current
>> repository which can happen if two repositories use the same worktree name.
>
> Problem worth solving.  Another would be to fail if the worktree ID
> proposed to be used is already in use, but the ID is supposed to be
> almost invisible (unless the user is doing some adiministrative work
> on the repository), generating a unique ID is a good approach.

There's already a `while` loop that tries incrementing the proposed id
(e.g., `develop1` if `develop` is taken). However, this is on
a per-repository basis, so there's no way to know what's already in use
in another repository. The problem arises when trying to run `worktree
repair` on a worktree that has a matching id (like `develop`) from
another repository. The goal here is that the same worktree name can be
created in different repositories and the id will be (effectively)
unique.

>> This series teaches Git to create worktrees with a unique suffix so
>> that the worktree_id is unique across all repositories even if they have
>> the same name. For example creating a worktree `develop` would look like:
>>
>>     foo/
>>     ├── .git/worktrees/develop-5445874156/
>>     └── develop/
>>     bar/
>>     ├── .git/worktrees/develop-1549518426/
>>     └── develop/
>>
>> The actual worktree directory name is still `develop`, but the
>> worktree_id is unique and prevents the "take over" scenario. The suffix
>> is given by the `git_rand()` function, but I'm open to suggestions if
>> there's a better random or hashing function to use.
>
> I do not think it matters much what hash/rand algorithm is chosen.
> What is important is what you do when the suffix suggested by that
> chosen algorithm collides with an existing worktree ID.  IOW, there
> is no way a "random" can guarantee uniqueness.  Attempt to create and
> if you find a collision, retry from the generation of another suffix,
> or something like that, is necessary.
>
> And as long as that "make sure it is unique" part is done right, it
> does not even have to be random.  Just generating a sequence number
> and using the first one that is available would work as well.

The `while` loop mentioned earlier still exists, so in the (unlikely)
event that the suffix collides with an existing worktree_id, it will
increment the suffix and try again.

Best,

Caleb

shejialuo Nov. 29, 2024, 11:05 a.m. UTC | #4

On Fri, Nov 29, 2024 at 02:44:24AM +0000, Caleb White wrote:
> The `es/worktree-repair-copied` topic added support for repairing a
> worktree from a copy scenario. I noted[1,2] that the topic added the
> ability for a repository to "take over" a worktree from another
> repository if the worktree_id matched a worktree inside the current
> repository which can happen if two repositories use the same worktree name.
> 

I somehow understand why we need to append a hash or a random number
into the current "id" field of the "struct worktree *". But I don't see
a _strong_ reason.

I think we need to figure out the following things:

    1. In what situation, there is a possibility that the user will
    repair the worktree from another repository.
    2. Why we need to hash to make sure the worktree is unique? From the
    expression, my intuitive way is that we need to distinguish whether
    the repository is the same.

> This series teaches Git to create worktrees with a unique suffix so
> that the worktree_id is unique across all repositories even if they have
> the same name. For example creating a worktree `develop` would look like:
> 
>     foo/
>     ├── .git/worktrees/develop-5445874156/
>     └── develop/
>     bar/
>     ├── .git/worktrees/develop-1549518426/
>     └── develop/
> 
> The actual worktree directory name is still `develop`, but the
> worktree_id is unique and prevents the "take over" scenario. The suffix
> is given by the `git_rand()` function, but I'm open to suggestions if
> there's a better random or hashing function to use.
> 

The actual worktree directory name is unchanged. But we have changed the
"worktree->id" and the git filesystem. Now, we will encounter much
trouble. The main reason is that we make the worktree name and worktree
id inconsistent. There are many tools which assume that worktree id is
the worktree name. In other words, there is no difference between the
worktree id and worktree name at current.

Let me give you an example.

The user could use "git update-ref" to update a ref from another ref.
So, a situation is that the user want to update(create) the
main-worktree ref from linked-worktree ref.

    ```sh
    git init repo && cd repo
    git commit --allow-empty -m initial
    git branch branch-1
    git worktree add ./worktree-1 branch-1
    (cd worktree-1 && git update-ref refs/worktree/branch-2 HEAD)
    ```
By the above operations, we will create a worktree-specified ref under
the ".git/worktrees/<worktree_id>/refs/worktree".

What if we want to this in the main worktree:

    ```sh
    git update-ref refs/heads/branch-3 \
        worktrees/worktree-1/refs/worktree/branch-2
    ```

So, with this patch, we make worktree-id not the same as worktree name.
If we do this. "git update-ref" cannot find the
".git/worktrees/worktree-1/refs/worktree/branch-2". This is because the
filesystem is changed to ".git/worktrees/worktree-1-<hash>/...".

If we use hash / random number to distinguish. We also need to change
the ref-related code to ignore the "-<hash>". It's impossible to let the
user type the extra hash / random number. However, this requires a lot
of effort.

So, I think we need a _strong_ reason to indicate that we must append
some chars into worktree id to do this.

Thanks,
Jialuo

Caleb White Nov. 29, 2024, 3:58 p.m. UTC | #5

On Fri Nov 29, 2024 at 5:05 AM CST, shejialuo wrote:
> On Fri, Nov 29, 2024 at 02:44:24AM +0000, Caleb White wrote:
>> The `es/worktree-repair-copied` topic added support for repairing a
>> worktree from a copy scenario. I noted[1,2] that the topic added the
>> ability for a repository to "take over" a worktree from another
>> repository if the worktree_id matched a worktree inside the current
>> repository which can happen if two repositories use the same worktree name.
>
> I somehow understand why we need to append a hash or a random number
> into the current "id" field of the "struct worktree *". But I don't see
> a _strong_ reason.
>
> I think we need to figure out the following things:
>
>     1. In what situation, there is a possibility that the user will
>     repair the worktree from another repository.

This can happen if a user accidentally mistypes a directory name when
executing `git worktree repair`. Or if a user copies a worktree from one
repository and the executes `git worktree repair` in a different repository.

The point is to prevent this from happening before it becomes a problem.

>     2. Why we need to hash to make sure the worktree is unique? From the
>     expression, my intuitive way is that we need to distinguish whether
>     the repository is the same.

During `worktree repair`, an "inferred backlink" is established by
parsing the worktree id from the `.git` file and checking if that
matches an existing worktree id in the current repository. If so, then
the link is established even if that worktree belonged to another
repository. The easiest way I thought to prevent this from happening is
to ensure that no "inferred backlink" can be established by using an
effectively unique worktree_id. So two repositories can have the same
worktree name (e.g., `develop`) but the actual ids would be different.
Additionally, if the ids **do match**, then this would be indicative of
a copy situation like the original topic addressed.

How do you propose to distinguish whether the repository is the same?

>> This series teaches Git to create worktrees with a unique suffix so
>> that the worktree_id is unique across all repositories even if they have
>> the same name. For example creating a worktree `develop` would look like:
>>
>>     foo/
>>     ├── .git/worktrees/develop-5445874156/
>>     └── develop/
>>     bar/
>>     ├── .git/worktrees/develop-1549518426/
>>     └── develop/
>>
>> The actual worktree directory name is still `develop`, but the
>> worktree_id is unique and prevents the "take over" scenario. The suffix
>> is given by the `git_rand()` function, but I'm open to suggestions if
>> there's a better random or hashing function to use.
>
> The actual worktree directory name is unchanged. But we have changed the
> "worktree->id" and the git filesystem. Now, we will encounter much
> trouble. The main reason is that we make the worktree name and worktree
> id inconsistent. There are many tools which assume that worktree id is
> the worktree name.

Do you have any sources for these tools? Because I'm not aware of any.
Any tool that needs the actual worktree id should be extracting the id
from the `.git` file and not using the worktree directory name.

> In other words, there is no difference between the worktree id and
> worktree name at current.

This is NOT true, there are several scenarios where they can currently differ:

1. git currently will append a number to the worktree name if it already
   exists in the repository. This means that you can create a `develop`
   worktree and wind up with an id of `develop2`.
2. git does not currently rename the id during a `worktree move`. This
   means that I can create a worktree with a name of `develop` and then
   execute `git worktree move develop master` and the id will still be
   `develop` while the directory is now `master`.
3. a user can manually move/rename the directory and then repair the
   worktree and wind up in the same situation as 2).

The suffix is not a new concept, I've just changed it from occasionally
adding a suffix to always adding a (unique) suffix. The worktree id has
never been guaranteed to be the same as the worktree name, and as
mentioned above, any tools/scripts/intelligence that need the id should
always be extracting it from the `.git` file.

One thing we can do is to add the worktree id to the `git worktree list`
output so that users can see the id and the name together. This would
make it easier for users/tools obtain the id if they need it without
having to parse the `.git` file.

> Let me give you an example.
>
> So, with this patch, we make worktree-id not the same as worktree name.
> If we do this. "git update-ref" cannot find the
> ".git/worktrees/worktree-1/refs/worktree/branch-2". This is because the
> filesystem is changed to ".git/worktrees/worktree-1-<hash>/...".

Yes this is expected because the worktree id is not the same as the
name.

> If we use hash / random number to distinguish. We also need to change
> the ref-related code to ignore the "-<hash>". It's impossible to let the
> user type the extra hash / random number. However, this requires a lot
> of effort.

This would be possible as long as the given worktree slug is unambiguous.
However, I think this is more trouble than it's worth.

> So, I think we need a _strong_ reason to indicate that we must append
> some chars into worktree id to do this.

As stated above, git already appends a number to the worktree name if it
collides with an existing directory. Always appending a unique suffix
should actually make things simpler / more consistent in the long run
because the worktree id will always be different from the name instead
of occasionally being different.

Best,

Caleb

shejialuo Nov. 29, 2024, 5:32 p.m. UTC | #6

On Fri, Nov 29, 2024 at 03:58:16PM +0000, Caleb White wrote:
> On Fri Nov 29, 2024 at 5:05 AM CST, shejialuo wrote:
> > On Fri, Nov 29, 2024 at 02:44:24AM +0000, Caleb White wrote:
> >> The `es/worktree-repair-copied` topic added support for repairing a
> >> worktree from a copy scenario. I noted[1,2] that the topic added the
> >> ability for a repository to "take over" a worktree from another
> >> repository if the worktree_id matched a worktree inside the current
> >> repository which can happen if two repositories use the same worktree name.
> >
> > I somehow understand why we need to append a hash or a random number
> > into the current "id" field of the "struct worktree *". But I don't see
> > a _strong_ reason.
> >
> > I think we need to figure out the following things:
> >
> >     1. In what situation, there is a possibility that the user will
> >     repair the worktree from another repository.
> 
> This can happen if a user accidentally mistypes a directory name when
> executing `git worktree repair`. Or if a user copies a worktree from one
> repository and the executes `git worktree repair` in a different repository.
> 
> The point is to prevent this from happening before it becomes a problem.
> 

If we could prevent this, this is great. But the current implementation
will cause much burden for the refs-related code. In other words, my
concern is that if we use this way, we may put a lot of efforts to
change the ref-related codes to adapt into this new design. So, we
should think carefully whether we should put so many efforts to solve
this corner case by this way.

I somehow know the background, because we have allowed both the absolute
and relative paths for worktree, we need to handle this problem.

I agree with your motivation, but we need to consider the burden
introduced. This is my point.

> >     2. Why we need to hash to make sure the worktree is unique? From the
> >     expression, my intuitive way is that we need to distinguish whether
> >     the repository is the same.
> 
> During `worktree repair`, an "inferred backlink" is established by
> parsing the worktree id from the `.git` file and checking if that
> matches an existing worktree id in the current repository. If so, then
> the link is established even if that worktree belonged to another
> repository. The easiest way I thought to prevent this from happening is
> to ensure that no "inferred backlink" can be established by using an
> effectively unique worktree_id. So two repositories can have the same
> worktree name (e.g., `develop`) but the actual ids would be different.
> Additionally, if the ids **do match**, then this would be indicative of
> a copy situation like the original topic addressed.
> 

OK.

> How do you propose to distinguish whether the repository is the same?
> 

I am sorry that I cannot tell you the answer. I haven't dived into the
worktree. I just gave my thoughts above.

> >> This series teaches Git to create worktrees with a unique suffix so
> >> that the worktree_id is unique across all repositories even if they have
> >> the same name. For example creating a worktree `develop` would look like:
> >>
> >>     foo/
> >>     ├── .git/worktrees/develop-5445874156/
> >>     └── develop/
> >>     bar/
> >>     ├── .git/worktrees/develop-1549518426/
> >>     └── develop/
> >>
> >> The actual worktree directory name is still `develop`, but the
> >> worktree_id is unique and prevents the "take over" scenario. The suffix
> >> is given by the `git_rand()` function, but I'm open to suggestions if
> >> there's a better random or hashing function to use.
> >
> > The actual worktree directory name is unchanged. But we have changed the
> > "worktree->id" and the git filesystem. Now, we will encounter much
> > trouble. The main reason is that we make the worktree name and worktree
> > id inconsistent. There are many tools which assume that worktree id is
> > the worktree name.
> 
> Do you have any sources for these tools? Because I'm not aware of any.
> Any tool that needs the actual worktree id should be extracting the id
> from the `.git` file and not using the worktree directory name.
> 

I am sorry that my words may confuse you here. These tools are just the
git builtins. Such as "git update-ref" and "git symbolic-ref".

> > In other words, there is no difference between the worktree id and
> > worktree name at current.
> 
> This is NOT true, there are several scenarios where they can currently differ:
> 
> 1. git currently will append a number to the worktree name if it already
>    exists in the repository. This means that you can create a `develop`
>    worktree and wind up with an id of `develop2`.
> 2. git does not currently rename the id during a `worktree move`. This
>    means that I can create a worktree with a name of `develop` and then
>    execute `git worktree move develop master` and the id will still be
>    `develop` while the directory is now `master`.
> 3. a user can manually move/rename the directory and then repair the
>    worktree and wind up in the same situation as 2).
> 

Thanks for this information. I am not familiar with the worktree. I just
have learned the worktree when doing something related to the refs. So,
it is true that worktree id and worktree name are not the same.

But that's wired. For any situation above, it won't cause any trouble
when the user is in the worktree. Because the user could just use
"refs/worktree/foo" to indicate the worktree specified ref without
knowing the worktree id.

So if a user moves the path from "worktree_1" to "worktree_2" and wants
to do the following operation in the main worktree:

    ```sh
    git update-ref refs/heads/master \
        worktrees/worktree_2/refs/worktree/foo
    ```
It will encounter error, because the worktree id is still the
"worktree_1".

But when the user create the worktree using the following command:

    ```sh
    git worktree add ./worktree_1 branch-1
    ```

The name `worktree_1`(path) will be the worktree id. So, when moving the
path "worktree_1" to "worktree_2". The user won't know the detail about
the worktree id. The user (like me) will just think that "worktree_2"
will be the new worktree id.

That does not make sense. It's impossible for the user know the mapping
between the worktree name and worktree id.

Hi Eric and Junio, I am wondering what the purpose of worktree is. When
I implemented the consistent check for ref contents, Junio has told me
that the main-worktree symref could point to the linked-worktree ref.
The linked-worktree symref could also point to another linked-worktree
ref.

But from above, it gives a feeling that the purpose of the worktree is
to make the environment totally independent. And we shouldn't allow any
ref interactions between the main-worktree and linked-worktree. But we
DO allow at now by using "git symbolic-ref" and "git update-ref".

> The suffix is not a new concept, I've just changed it from occasionally
> adding a suffix to always adding a (unique) suffix. The worktree id has
> never been guaranteed to be the same as the worktree name, and as
> mentioned above, any tools/scripts/intelligence that need the id should
> always be extracting it from the `.git` file.
> 

Yes, I agree. Thanks.

> One thing we can do is to add the worktree id to the `git worktree list`
> output so that users can see the id and the name together. This would
> make it easier for users/tools obtain the id if they need it without
> having to parse the `.git` file.
> 

This is a good idea, if we need to use this way. And we may also need to
add documentation.

> > Let me give you an example.
> >
> > So, with this patch, we make worktree-id not the same as worktree name.
> > If we do this. "git update-ref" cannot find the
> > ".git/worktrees/worktree-1/refs/worktree/branch-2". This is because the
> > filesystem is changed to ".git/worktrees/worktree-1-<hash>/...".
> 
> Yes this is expected because the worktree id is not the same as the
> name.
> 
> > If we use hash / random number to distinguish. We also need to change
> > the ref-related code to ignore the "-<hash>". It's impossible to let the
> > user type the extra hash / random number. However, this requires a lot
> > of effort.
> 
> This would be possible as long as the given worktree slug is unambiguous.
> However, I think this is more trouble than it's worth.
> 

As you can see from above, the main problem is that we allow some ref
interactions between the main-worktree and linked-worktrees.

The reason why I use this example is that I am afraid that the user will
create a new symbolic ref in the main worktree which points to the
linked-worktree ref.

When the user create a worktree, it is natural to think that there is no
difference between the worktree id and the worktree name. And will use
"worktrees/<worktree_id>/refs/worktree/foo" to access the ref in the
worktree.

And It's OK to add hash / random number to the worktree id if worktree
is totally independent. However, at now, we allow some interactions
between the main-worktree and linked worktrees (even the linked
worktree and another linked worktree).

When doing above, the user must know the worktree id. But at now
worktree name is not the same as worktree id.

So, I don't know...

> > So, I think we need a _strong_ reason to indicate that we must append
> > some chars into worktree id to do this.
> 
> As stated above, git already appends a number to the worktree name if it
> collides with an existing directory. Always appending a unique suffix
> should actually make things simpler / more consistent in the long run
> because the worktree id will always be different from the name instead
> of occasionally being different.
> 

In general, I agree with your way now after knowing the truth that
worktree name is not the same as the worktree id. However, it seems that
the situation is a little complicated.

> Best,
> 
> Caleb

Thanks,
Jialuo

Caleb White Nov. 29, 2024, 8:39 p.m. UTC | #7

On Fri Nov 29, 2024 at 11:32 AM CST, shejialuo wrote:
> On Fri, Nov 29, 2024 at 03:58:16PM +0000, Caleb White wrote:
>> On Fri Nov 29, 2024 at 5:05 AM CST, shejialuo wrote:
>> > I somehow understand why we need to append a hash or a random number
>> > into the current "id" field of the "struct worktree *". But I don't see
>> > a _strong_ reason.
>> >
>> > I think we need to figure out the following things:
>> >
>> >     1. In what situation, there is a possibility that the user will
>> >     repair the worktree from another repository.
>>
>> This can happen if a user accidentally mistypes a directory name when
>> executing `git worktree repair`. Or if a user copies a worktree from one
>> repository and the executes `git worktree repair` in a different repository.
>>
>> The point is to prevent this from happening before it becomes a problem.
>
> If we could prevent this, this is great. But the current implementation
> will cause much burden for the refs-related code. In other words, my
> concern is that if we use this way, we may put a lot of efforts to
> change the ref-related codes to adapt into this new design. So, we
> should think carefully whether we should put so many efforts to solve
> this corner case by this way.

I'm not sure that any of the ref related code would need to change,
all the tests are passing and the ref code uses the worktree id
for anything worktree related.

> I somehow know the background, because we have allowed both the absolute
> and relative paths for worktree, we need to handle this problem.

This edge case exists when using absolute paths, so this is not
a consequence of now allowing relative paths.

>> >     2. Why we need to hash to make sure the worktree is unique? From the
>> >     expression, my intuitive way is that we need to distinguish whether
>> >     the repository is the same.
>> 
>> How do you propose to distinguish whether the repository is the same?
>
> I am sorry that I cannot tell you the answer. I haven't dived into the
> worktree. I just gave my thoughts above.

That's fine, I'm just not really sure about how such a thing would be
done. To me, the unique id is the easiest and most intuitive. The
repository generally has no knowledge of other repositories on the
system (as far as I am aware).

>> Do you have any sources for these tools? Because I'm not aware of any.
>> Any tool that needs the actual worktree id should be extracting the id
>> from the `.git` file and not using the worktree directory name.
>
> I am sorry that my words may confuse you here. These tools are just the
> git builtins. Such as "git update-ref" and "git symbolic-ref".

Ah, I understand now---I thought you were talking about external tools.
All internal git builtins use the worktree id, the actual directory name
of the worktree is inconsequential. So nothing should need to change.

>> > In other words, there is no difference between the worktree id and
>> > worktree name at current.
>>
>> This is NOT true, there are several scenarios where they can currently differ:
>>
>> 1. git currently will append a number to the worktree name if it already
>>    exists in the repository. This means that you can create a `develop`
>>    worktree and wind up with an id of `develop2`.
>> 2. git does not currently rename the id during a `worktree move`. This
>>    means that I can create a worktree with a name of `develop` and then
>>    execute `git worktree move develop master` and the id will still be
>>    `develop` while the directory is now `master`.
>> 3. a user can manually move/rename the directory and then repair the
>>    worktree and wind up in the same situation as 2).
>>
>
> Thanks for this information. I am not familiar with the worktree. I just
> have learned the worktree when doing something related to the refs. So,
> it is true that worktree id and worktree name are not the same.
>
> But that's wired. For any situation above, it won't cause any trouble
> when the user is in the worktree. Because the user could just use
> "refs/worktree/foo" to indicate the worktree specified ref without
> knowing the worktree id.
>
> So if a user moves the path from "worktree_1" to "worktree_2" and wants
> to do the following operation in the main worktree:
>
>     ```sh
>     git update-ref refs/heads/master \
>         worktrees/worktree_2/refs/worktree/foo
>     ```
> It will encounter error, because the worktree id is still the
> "worktree_1".
>
> But when the user create the worktree using the following command:
>
>     ```sh
>     git worktree add ./worktree_1 branch-1
>     ```
>
> The name `worktree_1`(path) will be the worktree id. So, when moving the
> path "worktree_1" to "worktree_2". The user won't know the detail about
> the worktree id. The user (like me) will just think that "worktree_2"
> will be the new worktree id.
>
> That does not make sense. It's impossible for the user know the mapping
> between the worktree name and worktree id.

It's not impossible, the user can always look in the `.git` file.
However, I have added the id to the `worktree list` output to more easily
associate the id with the worktree.

>> One thing we can do is to add the worktree id to the `git worktree list`
>> output so that users can see the id and the name together. This would
>> make it easier for users/tools obtain the id if they need it without
>> having to parse the `.git` file.
>>
> This is a good idea, if we need to use this way. And we may also need to
> add documentation.

I have implemented this and added documentation.

> As you can see from above, the main problem is that we allow some ref
> interactions between the main-worktree and linked-worktrees.
>
> The reason why I use this example is that I am afraid that the user will
> create a new symbolic ref in the main worktree which points to the
> linked-worktree ref.
>
> When the user create a worktree, it is natural to think that there is no
> difference between the worktree id and the worktree name. And will use
> "worktrees/<worktree_id>/refs/worktree/foo" to access the ref in the
> worktree.
>
> And It's OK to add hash / random number to the worktree id if worktree
> is totally independent. However, at now, we allow some interactions
> between the main-worktree and linked worktrees (even the linked
> worktree and another linked worktree).
>
> When doing above, the user must know the worktree id. But at now
> worktree name is not the same as worktree id.
>
> So, I don't know...

Interactions between the main and linked (and linked with other linked)
worktrees is not a limiting factor here.

>> As stated above, git already appends a number to the worktree name if it
>> collides with an existing directory. Always appending a unique suffix
>> should actually make things simpler / more consistent in the long run
>> because the worktree id will always be different from the name instead
>> of occasionally being different.
>
> In general, I agree with your way now after knowing the truth that
> worktree name is not the same as the worktree id. However, it seems that
> the situation is a little complicated.

I see where you're coming from, but I do not believe that the situation
is any more complicated than it already is. All internal git code uses
the worktree id which is already not guaranteed to be the same as the
worktree directory name, so nothing is changing on that front.

Best,

Caleb

[0/2] Ensure unique worktree ids across repositories

Message

Comments