diff mbox series

[2/2] setup: don't fail if commondir reference is deleted.

Message ID 6f9c8775817117c2b36539eb048e2462a650ab8f.1550508544.git.msuchanek@suse.de (mailing list archive)
State New, archived
Headers show
Series worktree add race fix | expand

Commit Message

Michal Suchanek Feb. 18, 2019, 5:04 p.m. UTC
When adding wotktrees git can die in get_common_dir_noenv while
examining existing worktrees because the commondir file does not exist.
Rather than testing if the file exists before reading it handle ENOENT.

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
---
v2:
- do not test file existence first, just read it and handle ENOENT.
- handle zero size file correctly
---
 setup.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

Comments

Eric Sunshine Feb. 18, 2019, 9 p.m. UTC | #1
On Mon, Feb 18, 2019 at 12:05 PM Michal Suchanek <msuchanek@suse.de> wrote:
> When adding wotktrees git can die in get_common_dir_noenv while
> examining existing worktrees because the commondir file does not exist.
> Rather than testing if the file exists before reading it handle ENOENT.

This commit message leaves the reader wondering under what conditions
"commondir file" might not exist. For instance, the reader might
wonder "iIs it simply a condition of normal operation or does it arise
under odd circumstances?". Without this information, it is difficult
for someone reading the explanation to understand if or how this code
might validly be changed in the future. Your cover letter contained
explanation which likely ought to be duplicated here as an aid to
future readers.

> Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Duy Nguyen Feb. 21, 2019, 10:50 a.m. UTC | #2
On Tue, Feb 19, 2019 at 12:05 AM Michal Suchanek <msuchanek@suse.de> wrote:
>
> When adding wotktrees git can die in get_common_dir_noenv while
> examining existing worktrees because the commondir file does not exist.
> Rather than testing if the file exists before reading it handle ENOENT.

I don't think we could go around fixing every access to incomplete
worktrees like this. If this is because of racy 'worktree add', then
perhaps a better solution is make it absolutely clear it's not ready
for anybody to access.

For example, we can suffix the worktree directory name with ".lock"
and make sure get_worktrees() ignores entries ending with ".lock".
That should protect other commands while 'worktree add' is still
running. Only when the worktree is complete that 'worktree add' should
rename the directory to lose ".lock" and run external commands like
git-checkout to populate the worktree.

> Signed-off-by: Michal Suchanek <msuchanek@suse.de>
> ---
> v2:
> - do not test file existence first, just read it and handle ENOENT.
> - handle zero size file correctly
> ---
>  setup.c | 16 +++++++++++-----
>  1 file changed, 11 insertions(+), 5 deletions(-)
>
> diff --git a/setup.c b/setup.c
> index ca9e8a949ed8..dd865f280d34 100644
> --- a/setup.c
> +++ b/setup.c
> @@ -270,12 +270,20 @@ int get_common_dir_noenv(struct strbuf *sb, const char *gitdir)
>  {
>         struct strbuf data = STRBUF_INIT;
>         struct strbuf path = STRBUF_INIT;
> -       int ret = 0;
> +       int ret;
>
>         strbuf_addf(&path, "%s/commondir", gitdir);
> -       if (file_exists(path.buf)) {
> -               if (strbuf_read_file(&data, path.buf, 0) <= 0)
> +       ret = strbuf_read_file(&data, path.buf, 0);
> +       if (ret <= 0) {
> +               /*
> +                * if file is missing or zero size (just being written)
> +                * assume default, bail otherwise
> +                */
> +               if (ret && errno != ENOENT)
>                         die_errno(_("failed to read %s"), path.buf);
> +               strbuf_addstr(sb, gitdir);
> +               ret = 0;
> +       } else {
>                 while (data.len && (data.buf[data.len - 1] == '\n' ||
>                                     data.buf[data.len - 1] == '\r'))
>                         data.len--;
> @@ -286,8 +294,6 @@ int get_common_dir_noenv(struct strbuf *sb, const char *gitdir)
>                 strbuf_addbuf(&path, &data);
>                 strbuf_add_real_path(sb, path.buf);
>                 ret = 1;
> -       } else {
> -               strbuf_addstr(sb, gitdir);
>         }
>
>         strbuf_release(&data);
> --
> 2.20.1
>
Michal Suchanek Feb. 21, 2019, 1:50 p.m. UTC | #3
On Thu, 21 Feb 2019 17:50:38 +0700
Duy Nguyen <pclouds@gmail.com> wrote:

> On Tue, Feb 19, 2019 at 12:05 AM Michal Suchanek <msuchanek@suse.de> wrote:
> >
> > When adding wotktrees git can die in get_common_dir_noenv while
> > examining existing worktrees because the commondir file does not exist.
> > Rather than testing if the file exists before reading it handle ENOENT.  
> 
> I don't think we could go around fixing every access to incomplete
> worktrees like this. If this is because of racy 'worktree add', then
> perhaps a better solution is make it absolutely clear it's not ready
> for anybody to access.
> 
> For example, we can suffix the worktree directory name with ".lock"
> and make sure get_worktrees() ignores entries ending with ".lock".
> That should protect other commands while 'worktree add' is still
> running. Only when the worktree is complete that 'worktree add' should
> rename the directory to lose ".lock" and run external commands like
> git-checkout to populate the worktree.

The problem is we don't forbid worktree names ending with ".lock".
Which means that if we start to forbid them now existing worktrees
might become inaccessible.

Thanks

Michal
Phillip Wood Feb. 21, 2019, 5:07 p.m. UTC | #4
Hi Michal/Duy

On 21/02/2019 13:50, Michal Suchánek wrote:
> On Thu, 21 Feb 2019 17:50:38 +0700
> Duy Nguyen <pclouds@gmail.com> wrote:
> 
>> On Tue, Feb 19, 2019 at 12:05 AM Michal Suchanek <msuchanek@suse.de> wrote:
>>>
>>> When adding wotktrees git can die in get_common_dir_noenv while
>>> examining existing worktrees because the commondir file does not exist.
>>> Rather than testing if the file exists before reading it handle ENOENT.
>>
>> I don't think we could go around fixing every access to incomplete
>> worktrees like this. If this is because of racy 'worktree add', then
>> perhaps a better solution is make it absolutely clear it's not ready
>> for anybody to access.
>>
>> For example, we can suffix the worktree directory name with ".lock"
>> and make sure get_worktrees() ignores entries ending with ".lock".
>> That should protect other commands while 'worktree add' is still
>> running. Only when the worktree is complete that 'worktree add' should
>> rename the directory to lose ".lock" and run external commands like
>> git-checkout to populate the worktree.
> 
> The problem is we don't forbid worktree names ending with ".lock".
> Which means that if we start to forbid them now existing worktrees
> might become inaccessible.

I think it is also racy as the renaming breaks the use of mkdir erroring 
out if the directory already exists. One solution is to have a lock 
entry in $GIT_COMMON_DIR/worktree-locks and make sure the code that 
iterates over the entries in $GIT_COMMON_DIR/worktrees skips any that 
have a corresponding ignores in $GIT_COMMON_DIR/worktree-locks. If the 
worktree-locks/<dir> is created before worktree/<dir> then it should be 
race free (you will have to remove the lock if the real entry cannot be 
created and then increment the counter and try again). Entries could 
also be locked on removal to prevent a race there.

Best Wishes

Phillip

> Thanks
> 
> Michal
>
Eric Sunshine Feb. 21, 2019, 5:12 p.m. UTC | #5
On Thu, Feb 21, 2019 at 12:07 PM Phillip Wood <phillip.wood@talktalk.net> wrote:
> On 21/02/2019 13:50, Michal Suchánek wrote:
> >> On Tue, Feb 19, 2019 at 12:05 AM Michal Suchanek <msuchanek@suse.de> wrote:
> > The problem is we don't forbid worktree names ending with ".lock".
> > Which means that if we start to forbid them now existing worktrees
> > might become inaccessible.
>
> I think it is also racy as the renaming breaks the use of mkdir erroring
> out if the directory already exists. One solution is to have a lock
> entry in $GIT_COMMON_DIR/worktree-locks and make sure the code that
> iterates over the entries in $GIT_COMMON_DIR/worktrees skips any that
> have a corresponding ignores in $GIT_COMMON_DIR/worktree-locks. If the
> worktree-locks/<dir> is created before worktree/<dir> then it should be
> race free (you will have to remove the lock if the real entry cannot be
> created and then increment the counter and try again). Entries could
> also be locked on removal to prevent a race there.

I wonder, though, how much this helps or hinders the use-case which
prompted this patch series in the first place; to wit, creating
hundreds or thousands of worktrees. Doing so serially was too slow, so
the many "git worktree add" invocations were instead run in parallel
(which led to "discovery" of race conditions). Using a global worktree
lock would serialize worktree creation, thus slowing it down once
again.
Phillip Wood Feb. 21, 2019, 5:27 p.m. UTC | #6
Hi Eric

On 21/02/2019 17:12, Eric Sunshine wrote:
> On Thu, Feb 21, 2019 at 12:07 PM Phillip Wood <phillip.wood@talktalk.net> wrote:
>> On 21/02/2019 13:50, Michal Suchánek wrote:
>>>> On Tue, Feb 19, 2019 at 12:05 AM Michal Suchanek <msuchanek@suse.de> wrote:
>>> The problem is we don't forbid worktree names ending with ".lock".
>>> Which means that if we start to forbid them now existing worktrees
>>> might become inaccessible.
>>
>> I think it is also racy as the renaming breaks the use of mkdir erroring
>> out if the directory already exists. One solution is to have a lock
>> entry in $GIT_COMMON_DIR/worktree-locks and make sure the code that
>> iterates over the entries in $GIT_COMMON_DIR/worktrees skips any that
>> have a corresponding ignores in $GIT_COMMON_DIR/worktree-locks. If the
>> worktree-locks/<dir> is created before worktree/<dir> then it should be
>> race free (you will have to remove the lock if the real entry cannot be
>> created and then increment the counter and try again). Entries could
>> also be locked on removal to prevent a race there.
> 
> I wonder, though, how much this helps or hinders the use-case which
> prompted this patch series in the first place; to wit, creating
> hundreds or thousands of worktrees. Doing so serially was too slow, so
> the many "git worktree add" invocations were instead run in parallel
> (which led to "discovery" of race conditions). Using a global worktree
> lock would serialize worktree creation, thus slowing it down once
> again.

The idea is that there are per-worktree lock stored under worktree-locks 
(hence the plural name). Using a separate directory for the locks gets 
round the problems of name clashes between the lock for a worktree 
called foo and one called foo.lock and means we can rely on mkdir 
erroring out if the worktree name already exists as there is no renaming.

Best Wishes

Phillip
Michal Suchanek Feb. 21, 2019, 5:33 p.m. UTC | #7
On Thu, 21 Feb 2019 12:12:28 -0500
Eric Sunshine <sunshine@sunshineco.com> wrote:

> On Thu, Feb 21, 2019 at 12:07 PM Phillip Wood <phillip.wood@talktalk.net> wrote:
> > On 21/02/2019 13:50, Michal Suchánek wrote:  
> > >> On Tue, Feb 19, 2019 at 12:05 AM Michal Suchanek <msuchanek@suse.de> wrote:  
> > > The problem is we don't forbid worktree names ending with ".lock".
> > > Which means that if we start to forbid them now existing worktrees
> > > might become inaccessible.  
> >
> > I think it is also racy as the renaming breaks the use of mkdir erroring
> > out if the directory already exists. One solution is to have a lock
> > entry in $GIT_COMMON_DIR/worktree-locks and make sure the code that
> > iterates over the entries in $GIT_COMMON_DIR/worktrees skips any that
> > have a corresponding ignores in $GIT_COMMON_DIR/worktree-locks. If the
> > worktree-locks/<dir> is created before worktree/<dir> then it should be
> > race free (you will have to remove the lock if the real entry cannot be
> > created and then increment the counter and try again). Entries could
> > also be locked on removal to prevent a race there.  
> 
> I wonder, though, how much this helps or hinders the use-case which
> prompted this patch series in the first place; to wit, creating
> hundreds or thousands of worktrees. Doing so serially was too slow, so
> the many "git worktree add" invocations were instead run in parallel
> (which led to "discovery" of race conditions). Using a global worktree
> lock would serialize worktree creation, thus slowing it down once
> again.

I created thousands of worktrees only for stress-testing. The real
workload needs only a dozen of them. That still leads to hitting a
race condition occasionally and automation failure.

Creating a separate lock directory will probably work. The question is
when do you need to take the lock. Before adding a worktree, sure.
Before deleting it as well. The problem is that deleting a worktree
successfully without creating some broken state needs to exclude
processes that might add stuff in the worktree directory. How many
operations then do *not* need to take the lock?

Thanks

Michal
Duy Nguyen Feb. 22, 2019, 9:26 a.m. UTC | #8
On Thu, Feb 21, 2019 at 8:50 PM Michal Suchánek <msuchanek@suse.de> wrote:
>
> On Thu, 21 Feb 2019 17:50:38 +0700
> Duy Nguyen <pclouds@gmail.com> wrote:
>
> > On Tue, Feb 19, 2019 at 12:05 AM Michal Suchanek <msuchanek@suse.de> wrote:
> > >
> > > When adding wotktrees git can die in get_common_dir_noenv while
> > > examining existing worktrees because the commondir file does not exist.
> > > Rather than testing if the file exists before reading it handle ENOENT.
> >
> > I don't think we could go around fixing every access to incomplete
> > worktrees like this. If this is because of racy 'worktree add', then
> > perhaps a better solution is make it absolutely clear it's not ready
> > for anybody to access.
> >
> > For example, we can suffix the worktree directory name with ".lock"
> > and make sure get_worktrees() ignores entries ending with ".lock".
> > That should protect other commands while 'worktree add' is still
> > running. Only when the worktree is complete that 'worktree add' should
> > rename the directory to lose ".lock" and run external commands like
> > git-checkout to populate the worktree.
>
> The problem is we don't forbid worktree names ending with ".lock".
> Which means that if we start to forbid them now existing worktrees
> might become inaccessible.

Worktrees ending with .lock will not work well now anyway. While [1]
reports the problem with worktree names having a whitespace, ".lock"
is in the same class (not a valid refname) and will result the same
error. So if you have "*.lock" worktrees now you're already in
trouble.

[1] https://public-inbox.org/git/1550673274.30738.0@yandex.ru/T/#m9d86e0a388fd4961bc102c2c69e8bc3b2db07a42
Duy Nguyen Feb. 22, 2019, 9:32 a.m. UTC | #9
On Fri, Feb 22, 2019 at 12:07 AM Phillip Wood <phillip.wood@talktalk.net> wrote:
>
> Hi Michal/Duy
>
> On 21/02/2019 13:50, Michal Suchánek wrote:
> > On Thu, 21 Feb 2019 17:50:38 +0700
> > Duy Nguyen <pclouds@gmail.com> wrote:
> >
> >> On Tue, Feb 19, 2019 at 12:05 AM Michal Suchanek <msuchanek@suse.de> wrote:
> >>>
> >>> When adding wotktrees git can die in get_common_dir_noenv while
> >>> examining existing worktrees because the commondir file does not exist.
> >>> Rather than testing if the file exists before reading it handle ENOENT.
> >>
> >> I don't think we could go around fixing every access to incomplete
> >> worktrees like this. If this is because of racy 'worktree add', then
> >> perhaps a better solution is make it absolutely clear it's not ready
> >> for anybody to access.
> >>
> >> For example, we can suffix the worktree directory name with ".lock"
> >> and make sure get_worktrees() ignores entries ending with ".lock".
> >> That should protect other commands while 'worktree add' is still
> >> running. Only when the worktree is complete that 'worktree add' should
> >> rename the directory to lose ".lock" and run external commands like
> >> git-checkout to populate the worktree.
> >
> > The problem is we don't forbid worktree names ending with ".lock".
> > Which means that if we start to forbid them now existing worktrees
> > might become inaccessible.
>
> I think it is also racy as the renaming breaks the use of mkdir erroring
> out if the directory already exists.

You mean the part where we see "fred" exists and decide to try the
name "fred1" instead (i.e. patch 1/2)?

I don't think it's the problem if that's the case. We mkdir
"fred.lock" _then_ check if "fred" exists. If it does, remove
fred.lock and move on to fred1.lock. Then we rename fred1.lock to
fred1 and error out if rename fails.
Phillip Wood Feb. 22, 2019, 10:20 a.m. UTC | #10
Hi Duy

On 22/02/2019 09:32, Duy Nguyen wrote:
> On Fri, Feb 22, 2019 at 12:07 AM Phillip Wood <phillip.wood@talktalk.net> wrote:
>>
>> Hi Michal/Duy
>>
>> On 21/02/2019 13:50, Michal Suchánek wrote:
>>> On Thu, 21 Feb 2019 17:50:38 +0700
>>> Duy Nguyen <pclouds@gmail.com> wrote:
>>>
>>>> On Tue, Feb 19, 2019 at 12:05 AM Michal Suchanek <msuchanek@suse.de> wrote:
>>>>>
>>>>> When adding wotktrees git can die in get_common_dir_noenv while
>>>>> examining existing worktrees because the commondir file does not exist.
>>>>> Rather than testing if the file exists before reading it handle ENOENT.
>>>>
>>>> I don't think we could go around fixing every access to incomplete
>>>> worktrees like this. If this is because of racy 'worktree add', then
>>>> perhaps a better solution is make it absolutely clear it's not ready
>>>> for anybody to access.
>>>>
>>>> For example, we can suffix the worktree directory name with ".lock"
>>>> and make sure get_worktrees() ignores entries ending with ".lock".
>>>> That should protect other commands while 'worktree add' is still
>>>> running. Only when the worktree is complete that 'worktree add' should
>>>> rename the directory to lose ".lock" and run external commands like
>>>> git-checkout to populate the worktree.
>>>
>>> The problem is we don't forbid worktree names ending with ".lock".
>>> Which means that if we start to forbid them now existing worktrees
>>> might become inaccessible.
>>
>> I think it is also racy as the renaming breaks the use of mkdir erroring
>> out if the directory already exists.
> 
> You mean the part where we see "fred" exists and decide to try the
> name "fred1" instead (i.e. patch 1/2)?
> 
> I don't think it's the problem if that's the case. We mkdir
> "fred.lock" _then_ check if "fred" exists. If it does, remove
> fred.lock and move on to fred1.lock. Then we rename fred1.lock to
> fred1 and error out if rename fails.

Ah you're right, if another process tries to create fred.lock as we're 
renaming it either their mkdir fred.lock will fail or they'll see fred 
once they've made fred.lock

Sorry for the confusion

Phillip
Michal Suchanek March 4, 2019, 1:30 p.m. UTC | #11
Hello,

On Thu, 21 Feb 2019 17:27:04 +0000
Phillip Wood <phillip.wood@talktalk.net> wrote:

> Hi Eric
> 
> On 21/02/2019 17:12, Eric Sunshine wrote:
> > On Thu, Feb 21, 2019 at 12:07 PM Phillip Wood <phillip.wood@talktalk.net> wrote:  
> >> On 21/02/2019 13:50, Michal Suchánek wrote:  
> >>>> On Tue, Feb 19, 2019 at 12:05 AM Michal Suchanek <msuchanek@suse.de> wrote:  
> >>> The problem is we don't forbid worktree names ending with ".lock".
> >>> Which means that if we start to forbid them now existing worktrees
> >>> might become inaccessible.  
> >>
> >> I think it is also racy as the renaming breaks the use of mkdir erroring
> >> out if the directory already exists. One solution is to have a lock
> >> entry in $GIT_COMMON_DIR/worktree-locks and make sure the code that
> >> iterates over the entries in $GIT_COMMON_DIR/worktrees skips any that
> >> have a corresponding ignores in $GIT_COMMON_DIR/worktree-locks. If the
> >> worktree-locks/<dir> is created before worktree/<dir> then it should be
> >> race free (you will have to remove the lock if the real entry cannot be
> >> created and then increment the counter and try again). Entries could
> >> also be locked on removal to prevent a race there.  
> > 
> > I wonder, though, how much this helps or hinders the use-case which
> > prompted this patch series in the first place; to wit, creating
> > hundreds or thousands of worktrees. Doing so serially was too slow, so
> > the many "git worktree add" invocations were instead run in parallel
> > (which led to "discovery" of race conditions). Using a global worktree
> > lock would serialize worktree creation, thus slowing it down once
> > again.  
> 
> The idea is that there are per-worktree lock stored under worktree-locks 
> (hence the plural name). Using a separate directory for the locks gets 
> round the problems of name clashes between the lock for a worktree 
> called foo and one called foo.lock and means we can rely on mkdir 
> erroring out if the worktree name already exists as there is no renaming.

I suppose this separate directory would work. When are you supposed to
take the lock, though?

When adding worktree, sure.

When managing worktrees, sure. Otherwise you would see the incomplete
worktrees.

When doing anything in git? Probably. Because otherwise you could
accidentally use the incomplete worktree. Or somebody deleting worktree
would fail removing it because you would keep adding files to it.

Isn't git supposed to allow parallel access to the repository?

As things stand if you wanted to implement worktree locking you would
need to lock the worktree for *every* operation that touches it, and
for many operations you would have to lock/unlock *all* worktrees one by
one to find the worktree you are supposed to work on.

I don't feel like adding locking to all of git to fix this problem.

Sure, adding enough locking to ensure repository consistency at all
times would be nice but it also needs to be granular enough to not harm
performance. I can't say I understand the git repository layout and
usage well enough to design that.

Thanks

Michal
diff mbox series

Patch

diff --git a/setup.c b/setup.c
index ca9e8a949ed8..dd865f280d34 100644
--- a/setup.c
+++ b/setup.c
@@ -270,12 +270,20 @@  int get_common_dir_noenv(struct strbuf *sb, const char *gitdir)
 {
 	struct strbuf data = STRBUF_INIT;
 	struct strbuf path = STRBUF_INIT;
-	int ret = 0;
+	int ret;
 
 	strbuf_addf(&path, "%s/commondir", gitdir);
-	if (file_exists(path.buf)) {
-		if (strbuf_read_file(&data, path.buf, 0) <= 0)
+	ret = strbuf_read_file(&data, path.buf, 0);
+	if (ret <= 0) {
+		/*
+		 * if file is missing or zero size (just being written)
+		 * assume default, bail otherwise
+		 */
+		if (ret && errno != ENOENT)
 			die_errno(_("failed to read %s"), path.buf);
+		strbuf_addstr(sb, gitdir);
+		ret = 0;
+	} else {
 		while (data.len && (data.buf[data.len - 1] == '\n' ||
 				    data.buf[data.len - 1] == '\r'))
 			data.len--;
@@ -286,8 +294,6 @@  int get_common_dir_noenv(struct strbuf *sb, const char *gitdir)
 		strbuf_addbuf(&path, &data);
 		strbuf_add_real_path(sb, path.buf);
 		ret = 1;
-	} else {
-		strbuf_addstr(sb, gitdir);
 	}
 
 	strbuf_release(&data);