mbox series

[v2,0/3] Ensure unique worktree ids across repositories

Message ID 20241129-wt_unique_ids-v2-0-ff444e9e625a@pm.me (mailing list archive)
Headers show
Series Ensure unique worktree ids across repositories | expand

Message

Caleb White Nov. 29, 2024, 10:37 p.m. UTC
The `es/worktree-repair-copied` topic added support for repairing a
worktree from a copy scenario. I noted[1,2] that the topic added the
ability for a repository to "take over" a worktree from another
repository if the worktree_id matched a worktree inside the current
repository which can happen if two repositories use the same worktree name.

This series teaches Git to create worktrees with a unique suffix so
that the worktree_id is unique across all repositories even if they have
the same name. For example creating a worktree `develop` would look like:

    foo/
    ├── .git/worktrees/develop-5445874156/
    └── develop/
    bar/
    ├── .git/worktrees/develop-1549518426/
    └── develop/

The actual worktree directory name is still `develop`, but the
worktree_id is unique and prevents the "take over" scenario. The suffix
is given by the `git_rand()` function which should be sufficient to
ensure that the suffix is effectively unique (the likelihood of
a collision with the same name and suffix is extremely low).

During a `worktree move` the worktree directory is moved/renamed but the
repository under `worktrees/<id>` is not updated. For example, moving
`develop` to `master` results in

    foo/
    ├── .git/worktrees/develop-5445874156/
    └── master/

This works because the linking files still point to the correct
repository, but this is a little weird. This series teaches Git to also
move/rename the repository / worktree id during a `move` so that the
structure now looks like:

    foo/
    ├── .git/worktrees/master-1565465986/
    └── master/

Note that a new unique suffix is assigned to reduce the complexity of
trying to parse and reuse the existing suffix.

Additionally, this series teaches `worktree list` to output the worktree
id in the verbose and porcelain modes, which allows users and scripts to
more easily obtain the id for a given worktree.

[1]: https://lore.kernel.org/git/20241008153035.71178-1-cdwhite3@pm.me/
[2]: https://lore.kernel.org/git/r4zmcET41Skr_FMop47AKd7cms9E8bKPSvHuAUpnYavzKEY6JybJta0_7GfuYB0q-gD-XNcvh5VDTfiT3qthGKjqhS1sbT4M2lUABynOz2Q=@pm.me/

Signed-off-by: Caleb White <cdwhite3@pm.me>
---
The base for this series is obtained by merging the `cw/worktree-extension`
topic (2024-11-26, 20241125-wt_relative_options-v5-0-356d122ff3db@pm.me)
onto 090d24e9af.

Changes in v2:
- Add the worktree id to `worktree list` output
- Updated cover letter
- Link to v1: https://lore.kernel.org/r/20241128-wt_unique_ids-v1-0-30345d010e43@pm.me

---
Caleb White (3):
      worktree: add worktree with unique suffix
      worktree: rename worktree id during worktree move
      worktree: add id to `worktree list` output

 Documentation/git-worktree.txt     | 17 +++++--
 builtin/worktree.c                 | 35 ++++++++++++++
 t/t0035-safe-bare-repository.sh    |  4 +-
 t/t0600-reffiles-backend.sh        | 10 ++--
 t/t0601-reffiles-pack-refs.sh      |  4 +-
 t/t0610-reftable-basics.sh         | 54 +++++++++++-----------
 t/t1407-worktree-ref-store.sh      |  4 +-
 t/t1410-reflog.sh                  | 10 ++--
 t/t1415-worktree-refs.sh           | 26 +++++------
 t/t1450-fsck.sh                    | 14 +++---
 t/t1500-rev-parse.sh               |  6 +--
 t/t2400-worktree-add.sh            | 51 +++++++++++----------
 t/t2401-worktree-prune.sh          | 20 ++++----
 t/t2402-worktree-list.sh           | 16 ++++---
 t/t2403-worktree-move.sh           | 38 ++++++++--------
 t/t2405-worktree-submodule.sh      | 10 ++--
 t/t2406-worktree-repair.sh         | 93 ++++++++++++++++++++++++--------------
 t/t2407-worktree-heads.sh          | 27 +++++------
 t/t3200-branch.sh                  | 10 ++--
 t/t5304-prune.sh                   |  2 +-
 t/t7412-submodule-absorbgitdirs.sh |  4 +-
 21 files changed, 265 insertions(+), 190 deletions(-)
---
base-commit: 090d24e9af6e9f59c3f7bee97c42bb1ae3c7f559
change-id: 20241127-wt_unique_ids-1ffd7ea0bb19
prerequisite-change-id: 20241025-wt_relative_options-afa41987bc32:v6
prerequisite-patch-id: 179410e257e8eedf100f4f9faa9467cbbba4d61b
prerequisite-patch-id: 56ffe0afeadd511c9eef5f548a371659b040acab
prerequisite-patch-id: 809c1314e5dfa966f4f3d73b52f286f8aa89370f
prerequisite-patch-id: cf5f9491c8f8e58d1e0e103a5f8c64c55f2896e3
prerequisite-patch-id: 9884b33822bf4c7c3b89a9a6b49d4ab44c2670e7
prerequisite-patch-id: 62a09496d98d78a6bd1f9150ba887ee72359c7ee
prerequisite-patch-id: 5527e4b745963dd4fa08029491fcbfe3d91d5104
prerequisite-patch-id: bf433443e90939a493fa586de30938f78cb77020

Best regards,

Comments

Randall S. Becker Nov. 29, 2024, 10:54 p.m. UTC | #1
On November 29, 2024 5:38 PM, Caleb White writes:
>The `es/worktree-repair-copied` topic added support for repairing a worktree from
>a copy scenario. I noted[1,2] that the topic added the ability for a repository to
>"take over" a worktree from another repository if the worktree_id matched a
>worktree inside the current repository which can happen if two repositories use the
>same worktree name.
>
>This series teaches Git to create worktrees with a unique suffix so that the
>worktree_id is unique across all repositories even if they have the same name. For
>example creating a worktree `develop` would look like:
>
>    foo/
>    ├── .git/worktrees/develop-5445874156/
>    └── develop/
>    bar/
>    ├── .git/worktrees/develop-1549518426/
>    └── develop/
>
>The actual worktree directory name is still `develop`, but the worktree_id is unique
>and prevents the "take over" scenario. The suffix is given by the `git_rand()` function
>which should be sufficient to ensure that the suffix is effectively unique (the
>likelihood of a collision with the same name and suffix is extremely low).
>
>During a `worktree move` the worktree directory is moved/renamed but the
>repository under `worktrees/<id>` is not updated. For example, moving `develop` to
>`master` results in
>
>    foo/
>    ├── .git/worktrees/develop-5445874156/
>    └── master/
>
>This works because the linking files still point to the correct repository, but this is a
>little weird. This series teaches Git to also move/rename the repository / worktree id
>during a `move` so that the structure now looks like:
>
>    foo/
>    ├── .git/worktrees/master-1565465986/
>    └── master/
>
>Note that a new unique suffix is assigned to reduce the complexity of trying to parse
>and reuse the existing suffix.
>
>Additionally, this series teaches `worktree list` to output the worktree id in the
>verbose and porcelain modes, which allows users and scripts to more easily obtain
>the id for a given worktree.
>
>[1]: https://lore.kernel.org/git/20241008153035.71178-1-cdwhite3@pm.me/
>[2]:
>https://lore.kernel.org/git/r4zmcET41Skr_FMop47AKd7cms9E8bKPSvHuAUpnYav
>zKEY6JybJta0_7GfuYB0q-gD-
>XNcvh5VDTfiT3qthGKjqhS1sbT4M2lUABynOz2Q=@pm.me/
>
>Signed-off-by: Caleb White <cdwhite3@pm.me>
>---
>The base for this series is obtained by merging the `cw/worktree-extension` topic
>(2024-11-26, 20241125-wt_relative_options-v5-0-356d122ff3db@pm.me)
>onto 090d24e9af.
>
>Changes in v2:
>- Add the worktree id to `worktree list` output
>- Updated cover letter
>- Link to v1: https://lore.kernel.org/r/20241128-wt_unique_ids-v1-0-
>30345d010e43@pm.me
>
>---
>Caleb White (3):
>      worktree: add worktree with unique suffix
>      worktree: rename worktree id during worktree move
>      worktree: add id to `worktree list` output
>
> Documentation/git-worktree.txt     | 17 +++++--
> builtin/worktree.c                 | 35 ++++++++++++++
> t/t0035-safe-bare-repository.sh    |  4 +-
> t/t0600-reffiles-backend.sh        | 10 ++--
> t/t0601-reffiles-pack-refs.sh      |  4 +-
> t/t0610-reftable-basics.sh         | 54 +++++++++++-----------
> t/t1407-worktree-ref-store.sh      |  4 +-
> t/t1410-reflog.sh                  | 10 ++--
> t/t1415-worktree-refs.sh           | 26 +++++------
> t/t1450-fsck.sh                    | 14 +++---
> t/t1500-rev-parse.sh               |  6 +--
> t/t2400-worktree-add.sh            | 51 +++++++++++----------
> t/t2401-worktree-prune.sh          | 20 ++++----
> t/t2402-worktree-list.sh           | 16 ++++---
> t/t2403-worktree-move.sh           | 38 ++++++++--------
> t/t2405-worktree-submodule.sh      | 10 ++--
> t/t2406-worktree-repair.sh         | 93 ++++++++++++++++++++++++--------------
> t/t2407-worktree-heads.sh          | 27 +++++------
> t/t3200-branch.sh                  | 10 ++--
> t/t5304-prune.sh                   |  2 +-
> t/t7412-submodule-absorbgitdirs.sh |  4 +-
> 21 files changed, 265 insertions(+), 190 deletions(-)
>---
>base-commit: 090d24e9af6e9f59c3f7bee97c42bb1ae3c7f559
>change-id: 20241127-wt_unique_ids-1ffd7ea0bb19
>prerequisite-change-id: 20241025-wt_relative_options-afa41987bc32:v6
>prerequisite-patch-id: 179410e257e8eedf100f4f9faa9467cbbba4d61b
>prerequisite-patch-id: 56ffe0afeadd511c9eef5f548a371659b040acab
>prerequisite-patch-id: 809c1314e5dfa966f4f3d73b52f286f8aa89370f
>prerequisite-patch-id: cf5f9491c8f8e58d1e0e103a5f8c64c55f2896e3
>prerequisite-patch-id: 9884b33822bf4c7c3b89a9a6b49d4ab44c2670e7
>prerequisite-patch-id: 62a09496d98d78a6bd1f9150ba887ee72359c7ee
>prerequisite-patch-id: 5527e4b745963dd4fa08029491fcbfe3d91d5104
>prerequisite-patch-id: bf433443e90939a493fa586de30938f78cb77020

General comment on this series: Is there a mechanism of preserving existing
functionality for those of us who have existing scripts that depend on the
existing branch and worktree naming?
Caleb White Nov. 29, 2024, 11:13 p.m. UTC | #2
On Fri Nov 29, 2024 at 4:54 PM CST, rsbecker wrote:
> General comment on this series: Is there a mechanism of preserving existing
> functionality for those of us who have existing scripts that depend on the
> existing branch and worktree naming?

Existing worktrees will continue to work as they do now. The only change
is the worktree id for new worktrees. However, there's not an option to
preserve the existing behavior for new worktrees (nor do I think there
should be).

As stated in the v1 threads, the worktree id is already not guaranteed
to be equal to the worktree/branch name (there's several ways that this
can occur), so it's buggy behavior for scripts to make this assumption.
Any script that needs the worktree id should be parsing it from the 
`.git` file, `git rev-parse --git-dir`, or (with the changes in this
series) `git worktree list`.

Best,

Caleb
Randall S. Becker Nov. 29, 2024, 11:17 p.m. UTC | #3
On November 29, 2024 6:14 PM, Caleb White writes:
>On Fri Nov 29, 2024 at 4:54 PM CST, rsbecker wrote:
>> General comment on this series: Is there a mechanism of preserving
>> existing functionality for those of us who have existing scripts that
>> depend on the existing branch and worktree naming?
>
>Existing worktrees will continue to work as they do now. The only change is the
>worktree id for new worktrees. However, there's not an option to preserve the
>existing behavior for new worktrees (nor do I think there should be).

I do not agree. Companies that have existing scripts should have some way to
preserve their investment. Just saying "No more worktrees for you" is not
really considerate.

>As stated in the v1 threads, the worktree id is already not guaranteed to be equal to
>the worktree/branch name (there's several ways that this can occur), so it's buggy
>behavior for scripts to make this assumption.
>Any script that needs the worktree id should be parsing it from the `.git` file, `git rev-
>parse --git-dir`, or (with the changes in this
>series) `git worktree list`.

I agree, but I think having some kind of notice beyond one release is important, rather
than pulling the rug out from under people.

Just my suggestion that there should be a migration period of this critical function.
--Randall
Caleb White Nov. 29, 2024, 11:29 p.m. UTC | #4
On Fri Nov 29, 2024 at 5:17 PM CST, rsbecker wrote:
> On November 29, 2024 6:14 PM, Caleb White writes:
>>On Fri Nov 29, 2024 at 4:54 PM CST, rsbecker wrote:
>>> General comment on this series: Is there a mechanism of preserving
>>> existing functionality for those of us who have existing scripts that
>>> depend on the existing branch and worktree naming?
>>
>>Existing worktrees will continue to work as they do now. The only change is the
>>worktree id for new worktrees. However, there's not an option to preserve the
>>existing behavior for new worktrees (nor do I think there should be).
>
> I do not agree. Companies that have existing scripts should have some way to
> preserve their investment. Just saying "No more worktrees for you" is not
> really considerate.

How exactly are your scripts depending on the worktree id? There are
very few reasons a script might need to know the worktree id, and
I suspect that there's some confusion here. The worktree name is still
used with the `git worktree` commands, so there no change on that front.

Best,

Caleb
Randall S. Becker Nov. 29, 2024, 11:44 p.m. UTC | #5
On November 29, 2024 6:29 PM, Caleb White wrote:
>On Fri Nov 29, 2024 at 5:17 PM CST, rsbecker wrote:
>> On November 29, 2024 6:14 PM, Caleb White writes:
>>>On Fri Nov 29, 2024 at 4:54 PM CST, rsbecker wrote:
>>>> General comment on this series: Is there a mechanism of preserving
>>>> existing functionality for those of us who have existing scripts
>>>> that depend on the existing branch and worktree naming?
>>>
>>>Existing worktrees will continue to work as they do now. The only
>>>change is the worktree id for new worktrees. However, there's not an
>>>option to preserve the existing behavior for new worktrees (nor do I think there
>should be).
>>
>> I do not agree. Companies that have existing scripts should have some
>> way to preserve their investment. Just saying "No more worktrees for
>> you" is not really considerate.
>
>How exactly are your scripts depending on the worktree id? There are very few
>reasons a script might need to know the worktree id, and I suspect that there's
>some confusion here. The worktree name is still used with the `git worktree`
>commands, so there no change on that front.

The graphic describing this showed the id in addition to the worktree name.
During cleanup detection, the directory of the worktree is significant. If that
Observation is wrong, I retract all this.
Caleb White Nov. 30, 2024, 12:08 a.m. UTC | #6
On Fri Nov 29, 2024 at 5:44 PM CST, rsbecker wrote:
> On November 29, 2024 6:29 PM, Caleb White wrote:
>>On Fri Nov 29, 2024 at 5:17 PM CST, rsbecker wrote:
>>> On November 29, 2024 6:14 PM, Caleb White writes:
>>>>On Fri Nov 29, 2024 at 4:54 PM CST, rsbecker wrote:
>>>>> General comment on this series: Is there a mechanism of preserving
>>>>> existing functionality for those of us who have existing scripts
>>>>> that depend on the existing branch and worktree naming?
>>>>
>>>>Existing worktrees will continue to work as they do now. The only
>>>>change is the worktree id for new worktrees. However, there's not an
>>>>option to preserve the existing behavior for new worktrees (nor do I think there
>>should be).
>>>
>>> I do not agree. Companies that have existing scripts should have some
>>> way to preserve their investment. Just saying "No more worktrees for
>>> you" is not really considerate.
>>
>>How exactly are your scripts depending on the worktree id? There are very few
>>reasons a script might need to know the worktree id, and I suspect that there's
>>some confusion here. The worktree name is still used with the `git worktree`
>>commands, so there no change on that front.
>
> The graphic describing this showed the id in addition to the worktree name.
> During cleanup detection, the directory of the worktree is significant. If that
> Observation is wrong, I retract all this.

So here's the graphic again:

    foo/
    ├── .git/worktrees/develop-5445874156/
    └── develop/

Here, the `develop` directory is the worktree directory (this can be
located anywhere), and the `develop-5445874156` is the worktree id.
However, the worktree id can already be something like `develop1`
or something else entirely if the `develop` directory was renamed in the
past. Again, there are very few things a script should need to know the
worktree id for.

If the `develop` directory is deleted, cleanup detection is handled
by the `git worktree prune` command, which will remove worktrees under
`.git/worktrees/*` that are no longer valid. This happens automatically
after the expiry time or it can be executed manually. Of course,
executing `git worktree remove develop` will also remove the worktree
and its associated worktree id.

Best,

Caleb
Randall S. Becker Nov. 30, 2024, 12:38 a.m. UTC | #7
On November 29, 2024 7:09 PM, Caleb White wrote:
>To: rsbecker@nexbridge.com; git@vger.kernel.org
>Cc: 'shejialuo' <shejialuo@gmail.com>; 'Junio C Hamano' <gitster@pobox.com>
>Subject: Re: [PATCH v2 0/3] Ensure unique worktree ids across repositories
>
>On Fri Nov 29, 2024 at 5:44 PM CST, rsbecker wrote:
>> On November 29, 2024 6:29 PM, Caleb White wrote:
>>>On Fri Nov 29, 2024 at 5:17 PM CST, rsbecker wrote:
>>>> On November 29, 2024 6:14 PM, Caleb White writes:
>>>>>On Fri Nov 29, 2024 at 4:54 PM CST, rsbecker wrote:
>>>>>> General comment on this series: Is there a mechanism of preserving
>>>>>> existing functionality for those of us who have existing scripts
>>>>>> that depend on the existing branch and worktree naming?
>>>>>
>>>>>Existing worktrees will continue to work as they do now. The only
>>>>>change is the worktree id for new worktrees. However, there's not an
>>>>>option to preserve the existing behavior for new worktrees (nor do I
>>>>>think there
>>>should be).
>>>>
>>>> I do not agree. Companies that have existing scripts should have
>>>> some way to preserve their investment. Just saying "No more
>>>> worktrees for you" is not really considerate.
>>>
>>>How exactly are your scripts depending on the worktree id? There are
>>>very few reasons a script might need to know the worktree id, and I
>>>suspect that there's some confusion here. The worktree name is still
>>>used with the `git worktree` commands, so there no change on that front.
>>
>> The graphic describing this showed the id in addition to the worktree name.
>> During cleanup detection, the directory of the worktree is
>> significant. If that Observation is wrong, I retract all this.
>
>So here's the graphic again:
>
>    foo/
>    ├── .git/worktrees/develop-5445874156/
>    └── develop/
>
>Here, the `develop` directory is the worktree directory (this can be located
>anywhere), and the `develop-5445874156` is the worktree id.
>However, the worktree id can already be something like `develop1` or something
>else entirely if the `develop` directory was renamed in the past. Again, there are very
>few things a script should need to know the worktree id for.
>
>If the `develop` directory is deleted, cleanup detection is handled by the `git
>worktree prune` command, which will remove worktrees under `.git/worktrees/*`
>that are no longer valid. This happens automatically after the expiry time or it can be
>executed manually. Of course, executing `git worktree remove develop` will also
>remove the worktree and its associated worktree id.

This last bit is an assumption, and not necessarily valid. Scripts that use worktrees
may maintain lists or their own pointers. It is important to be able to emulate
cleanup functions - something I discovered early in the worktree functions
when released. I need to make sure that cleanup will continue to have enough
information - prior to git worktree cleanup - to function correctly. This will
need coordination with people who have such scripts in my community. It
probably will not impact you, but I would have appreciated more than one
release notice on this capability.
Caleb White Nov. 30, 2024, 4:08 p.m. UTC | #8
On Fri Nov 29, 2024 at 6:38 PM CST, rsbecker wrote:
> On November 29, 2024 7:09 PM, Caleb White wrote:
>>If the `develop` directory is deleted, cleanup detection is handled by the `git
>>worktree prune` command, which will remove worktrees under `.git/worktrees/*`
>>that are no longer valid. This happens automatically after the expiry time or it can be
>>executed manually. Of course, executing `git worktree remove develop` will also
>>remove the worktree and its associated worktree id.
>
> This last bit is an assumption, and not necessarily valid. Scripts that use worktrees
> may maintain lists or their own pointers. It is important to be able to emulate
> cleanup functions - something I discovered early in the worktree functions
> when released. I need to make sure that cleanup will continue to have enough
> information - prior to git worktree cleanup - to function correctly. This will
> need coordination with people who have such scripts in my community. It
> probably will not impact you, but I would have appreciated more than one
> release notice on this capability.

I'm not sure I understand the specific use-case you're talking about.
Could you provide an example?

However, I suppose I can add a config / env variable to be able to
disable this new functionality.

Best,

Caleb