diff mbox series

[v3,2/2] setup: don't fail if commondir reference is deleted.

Message ID 37df7fd81c3dee990bd7723f18c94713a0d842b6.1550679076.git.msuchanek@suse.de (mailing list archive)
State New, archived
Headers show
Series None | expand

Commit Message

Michal Suchanek Feb. 20, 2019, 4:16 p.m. UTC
Apparently it can happen that stat() claims there is a commondir file but when
trying to open the file it is missing.

Another even rarer issue is that the file might be zero size because another
process initializing a worktree opened the file but has not written is content
yet.

When any of this happnes git aborts failing to perform perfectly valid
command because unrelated worktree is not yet fully initialized.

Rather than testing if the file exists before reading it handle ENOENT
and ENOTDIR.

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
---
v2:
- do not test file existence first, just read it and handle ENOENT.
- handle zero size file correctly
v3:
- handle ENOTDIR as well
- add more details to commit message
---
 setup.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

Comments

Eric Sunshine Feb. 20, 2019, 4:55 p.m. UTC | #1
On Wed, Feb 20, 2019 at 11:17 AM Michal Suchanek <msuchanek@suse.de> wrote:
> Apparently it can happen that stat() claims there is a commondir file but when
> trying to open the file it is missing.

Under what circumstances?

> Another even rarer issue is that the file might be zero size because another
> process initializing a worktree opened the file but has not written is content
> yet.

Based upon the explanation thus far, I'm having trouble understanding
under what circumstances these race conditions can arise. Are you
trying to invoke Git commands in a particular worktree even as the
worktree itself is being created?

Without this information being spelled out clearly, it is going to be
difficult for someone in the future to reason about why the code is
the way it is following this change.

> When any of this happnes git aborts failing to perform perfectly valid
> command because unrelated worktree is not yet fully initialized.

s/happnes/happens/

> Rather than testing if the file exists before reading it handle ENOENT
> and ENOTDIR.

One more comment below...

> Signed-off-by: Michal Suchanek <msuchanek@suse.de>
> ---
> diff --git a/setup.c b/setup.c
> @@ -270,12 +270,20 @@ int get_common_dir_noenv(struct strbuf *sb, const char *gitdir)
>  {
>         strbuf_addf(&path, "%s/commondir", gitdir);
> -       if (file_exists(path.buf)) {
> -               if (strbuf_read_file(&data, path.buf, 0) <= 0)
> +       ret = strbuf_read_file(&data, path.buf, 0);
> +       if (ret <= 0) {
> +               /*
> +                * if file is missing or zero size (just being written)
> +                * assume default, bail otherwise
> +                */
> +               if (ret && errno != ENOENT && errno != ENOTDIR)
>                         die_errno(_("failed to read %s"), path.buf);

It's not clear from the explanation given in the commit message if the
new behavior is indeed sensible. The original intent of the code, as I
understand it, is to validate "commondir", to ensure that it is not
somehow corrupt (such as the user editing it and making it empty).
Following this change, that particular validation no longer takes
place. But, more importantly, what does it mean to fall back to
"default" for this particular worktree? I'm having trouble
understanding how the new behavior can be correct or desirable. (Am I
missing something obvious?)

> +               strbuf_addstr(sb, gitdir);
> +               ret = 0;
> +       } else {
Michal Suchanek Feb. 20, 2019, 5:16 p.m. UTC | #2
On Wed, 20 Feb 2019 11:55:46 -0500
Eric Sunshine <sunshine@sunshineco.com> wrote:

> On Wed, Feb 20, 2019 at 11:17 AM Michal Suchanek <msuchanek@suse.de> wrote:
> > Apparently it can happen that stat() claims there is a commondir file but when
> > trying to open the file it is missing.  
> 
> Under what circumstances?

I would like to know that as well. The only command tested was worktree
add which should not remove the file. Nonetheless running many woktree
add commands in parallel can cause the file to go away for some of
them. For many commands git calls itself recursively so there is
probably much more going on than the single function that creates the
worktree.

> 
> > Another even rarer issue is that the file might be zero size because another
> > process initializing a worktree opened the file but has not written is content
> > yet.  
> 
> Based upon the explanation thus far, I'm having trouble understanding
> under what circumstances these race conditions can arise. Are you
> trying to invoke Git commands in a particular worktree even as the
> worktree itself is being created?

It's explained in the following paragraph. If you have multiple
worktrees some *other* worktreee may be uninitialized.

> 
> Without this information being spelled out clearly, it is going to be
> difficult for someone in the future to reason about why the code is
> the way it is following this change.
> 
> > When any of this happnes git aborts failing to perform perfectly valid
> > command because unrelated worktree is not yet fully initialized.  
> 
> s/happnes/happens/
> 
> > Rather than testing if the file exists before reading it handle ENOENT
> > and ENOTDIR.  
> 
> One more comment below...
> 
> > Signed-off-by: Michal Suchanek <msuchanek@suse.de>
> > ---
> > diff --git a/setup.c b/setup.c
> > @@ -270,12 +270,20 @@ int get_common_dir_noenv(struct strbuf *sb, const char *gitdir)
> >  {
> >         strbuf_addf(&path, "%s/commondir", gitdir);
> > -       if (file_exists(path.buf)) {
> > -               if (strbuf_read_file(&data, path.buf, 0) <= 0)
> > +       ret = strbuf_read_file(&data, path.buf, 0);
> > +       if (ret <= 0) {
> > +               /*
> > +                * if file is missing or zero size (just being written)
> > +                * assume default, bail otherwise
> > +                */
> > +               if (ret && errno != ENOENT && errno != ENOTDIR)
> >                         die_errno(_("failed to read %s"), path.buf);  
> 
> It's not clear from the explanation given in the commit message if the
> new behavior is indeed sensible. The original intent of the code, as I
> understand it, is to validate "commondir", to ensure that it is not
> somehow corrupt (such as the user editing it and making it empty).

How is it validated in the code below when it is non-zero size?

There is *no* validation whatsoever. Yet zero size is somehow totally
unacceptable and requires that git working in *any* worktree aborts if
commondir file in *any* worktree is zero size.

> Following this change, that particular validation no longer takes
> place. But, more importantly, what does it mean to fall back to
> "default" for this particular worktree? I'm having trouble
> understanding how the new behavior can be correct or desirable. (Am I
> missing something obvious?)

If the file can be missing altogether and it is not an error how it is
incorrect or undesirable to ignore zero size file?

Thanks

Michal
Eric Sunshine Feb. 20, 2019, 6:35 p.m. UTC | #3
On Wed, Feb 20, 2019 at 12:16 PM Michal Suchánek <msuchanek@suse.de> wrote:
> On Wed, 20 Feb 2019 11:55:46 -0500
> Eric Sunshine <sunshine@sunshineco.com> wrote:
> > On Wed, Feb 20, 2019 at 11:17 AM Michal Suchanek <msuchanek@suse.de> wrote:
> > > Apparently it can happen that stat() claims there is a commondir file but when
> > > trying to open the file it is missing.
> >
> > Under what circumstances?
>
> I would like to know that as well. The only command tested was worktree
> add which should not remove the file. Nonetheless running many woktree
> add commands in parallel can cause the file to go away for some of
> them.

You actually encountered this particular error message, correct? Was
that before or after you fixed the race in builtin/worktree.c itself
via patch 1/2? Did the reported 'errno' indicate that the file did not
exist or was it some other error?

> For many commands git calls itself recursively so there is
> probably much more going on than the single function that creates the
> worktree.

"git worktree add" is careful to invoke other Git commands only after
"commondir" exists, so it's not clear how this circumstance arises if
the file is indeed missing by the time the other Git command is run.

> > > Another even rarer issue is that the file might be zero size because another
> > > process initializing a worktree opened the file but has not written is content
> > > yet.
> >
> > Based upon the explanation thus far, I'm having trouble understanding
> > under what circumstances these race conditions can arise. Are you
> > trying to invoke Git commands in a particular worktree even as the
> > worktree itself is being created?
>
> It's explained in the following paragraph. If you have multiple
> worktrees some *other* worktreee may be uninitialized.

I understand that, but setup.c:get_common_dir_noenv() is concerned
only with _this_ worktree -- the one in which the Git command is being
run -- so it's not clear if or how some other partially-initialized
worktree could have any impact. (And, I'm having trouble fathoming how
it could, which is why I'm asking these questions).

Is it possible that when you saw that error message, it actually arose
from some code other than setup.c:get_common_dir_noenv()?

> > > -       if (file_exists(path.buf)) {
> > > -               if (strbuf_read_file(&data, path.buf, 0) <= 0)
> > > +       ret = strbuf_read_file(&data, path.buf, 0);
> > > +       if (ret <= 0) {
> > > +               /*
> > > +                * if file is missing or zero size (just being written)
> > > +                * assume default, bail otherwise
> > > +                */
> > > +               if (ret && errno != ENOENT && errno != ENOTDIR)
> > >                         die_errno(_("failed to read %s"), path.buf);
> >
> > It's not clear from the explanation given in the commit message if the
> > new behavior is indeed sensible. The original intent of the code, as I
> > understand it, is to validate "commondir", to ensure that it is not
> > somehow corrupt (such as the user editing it and making it empty).
>
> How is it validated in the code below when it is non-zero size?

Checking whether the file has content _is_ a form of validation, even
if not extensive validation.

> There is *no* validation whatsoever. Yet zero size is somehow totally
> unacceptable and requires that git working in *any* worktree aborts if
> commondir file in *any* worktree is zero size.

As noted above, it's not clear from the commit message how this case
can arise given that setup.c:get_common_dir_noenv() is presumably
concerned with and only consults _this_ worktree, so I'm having
trouble understanding how the state of other worktrees could impact
it.

> > Following this change, that particular validation no longer takes
> > place. But, more importantly, what does it mean to fall back to
> > "default" for this particular worktree? I'm having trouble
> > understanding how the new behavior can be correct or desirable. (Am I
> > missing something obvious?)
>
> If the file can be missing altogether and it is not an error how it is
> incorrect or undesirable to ignore zero size file?

Because the _presence_ of that file indicates a linked worktree,
whereas it's absence indicates the main worktree. If the file is
present but empty, then that is an abnormal condition, i.e. some form
of corruption.

The difference is significant, and that's why I'm asking if the new
behavior is correct or desirable. If you start interpreting this
abnormal condition as a non-error, then get_common_dir_noenv() will be
reporting that this is the main worktree when in fact it is (a somehow
corrupted) linked worktree. Such false reporting could trigger
undesirable and outright wrong behavior in callers.
Eric Sunshine Feb. 21, 2019, 9:27 a.m. UTC | #4
On Wed, Feb 20, 2019 at 1:35 PM Eric Sunshine <sunshine@sunshineco.com> wrote:
> On Wed, Feb 20, 2019 at 12:16 PM Michal Suchánek <msuchanek@suse.de> wrote:
> > On Wed, 20 Feb 2019 11:55:46 -0500
> > Eric Sunshine <sunshine@sunshineco.com> wrote:
> > > On Wed, Feb 20, 2019 at 11:17 AM Michal Suchanek <msuchanek@suse.de> wrote:
> > > > Another even rarer issue is that the file might be zero size because another
> > > > process initializing a worktree opened the file but has not written is content
> > > > yet.
> > >
> > > Based upon the explanation thus far, I'm having trouble understanding
> > > under what circumstances these race conditions can arise. Are you
> > > trying to invoke Git commands in a particular worktree even as the
> > > worktree itself is being created?
> >
> > It's explained in the following paragraph. If you have multiple
> > worktrees some *other* worktreee may be uninitialized.
>
> I understand that, but setup.c:get_common_dir_noenv() is concerned
> only with _this_ worktree -- the one in which the Git command is being
> run -- so it's not clear if or how some other partially-initialized
> worktree could have any impact. (And, I'm having trouble fathoming how
> it could, which is why I'm asking these questions).

I still can't see how setup.c:get_common_dir_noenv() could be
responsible for the behavior you're describing of _any_ Git command
erroring out due to _any_ worktree being incompletely-initialized.
However, I can imagine "git worktree add" itself being racy and
failing due to a missing or empty "commondir" file for some other
worktree since that command _does_ consult other worktree entries when
validating the "add" operation via
builtin/worktree.c:validate_worktree_add() which calls
get_worktrees(). If get_worktrees() is subject to that raciness
problem, then "git worktree add" will inherit that undesirable
raciness behavior (as will other "git worktree" commands which call
get_worktrees(), such as "git worktree list").

> Is it possible that when you saw that error message, it actually arose
> from some code other than setup.c:get_common_dir_noenv()?

So, I'm suspecting get_worktrees() or some function it calls (and so
on) as the racy culprit.
Michal Suchanek Feb. 21, 2019, 11:13 a.m. UTC | #5
On Thu, 21 Feb 2019 04:27:21 -0500
Eric Sunshine <sunshine@sunshineco.com> wrote:

> On Wed, Feb 20, 2019 at 1:35 PM Eric Sunshine <sunshine@sunshineco.com> wrote:
> > On Wed, Feb 20, 2019 at 12:16 PM Michal Suchánek <msuchanek@suse.de> wrote:  
> > > On Wed, 20 Feb 2019 11:55:46 -0500
> > > Eric Sunshine <sunshine@sunshineco.com> wrote:  
> > > > On Wed, Feb 20, 2019 at 11:17 AM Michal Suchanek <msuchanek@suse.de> wrote:  
> > > > > Another even rarer issue is that the file might be zero size because another
> > > > > process initializing a worktree opened the file but has not written is content
> > > > > yet.  
> > > >
> > > > Based upon the explanation thus far, I'm having trouble understanding
> > > > under what circumstances these race conditions can arise. Are you
> > > > trying to invoke Git commands in a particular worktree even as the
> > > > worktree itself is being created?  
> > >
> > > It's explained in the following paragraph. If you have multiple
> > > worktrees some *other* worktreee may be uninitialized.  
> >
> > I understand that, but setup.c:get_common_dir_noenv() is concerned
> > only with _this_ worktree -- the one in which the Git command is being
> > run -- so it's not clear if or how some other partially-initialized
> > worktree could have any impact. (And, I'm having trouble fathoming how
> > it could, which is why I'm asking these questions).  
> 
> I still can't see how setup.c:get_common_dir_noenv() could be
> responsible for the behavior you're describing of _any_ Git command
> erroring out due to _any_ worktree being incompletely-initialized.
> However, I can imagine "git worktree add" itself being racy and
> failing due to a missing or empty "commondir" file for some other
> worktree since that command _does_ consult other worktree entries when
> validating the "add" operation via
> builtin/worktree.c:validate_worktree_add() which calls
> get_worktrees(). If get_worktrees() is subject to that raciness
> problem, then "git worktree add" will inherit that undesirable
> raciness behavior (as will other "git worktree" commands which call
> get_worktrees(), such as "git worktree list").
> 
> > Is it possible that when you saw that error message, it actually arose
> > from some code other than setup.c:get_common_dir_noenv()?  
> 
> So, I'm suspecting get_worktrincludes both itees() or some function it calls (and so
> on) as the racy culprit.

Yes, that's my explanation for the situation as well.

Thanks

Michal
Michal Suchanek Feb. 21, 2019, 11:19 a.m. UTC | #6
On Wed, 20 Feb 2019 13:35:57 -0500
Eric Sunshine <sunshine@sunshineco.com> wrote:

> On Wed, Feb 20, 2019 at 12:16 PM Michal Suchánek <msuchanek@suse.de> wrote:
> > On Wed, 20 Feb 2019 11:55:46 -0500
> > Eric Sunshine <sunshine@sunshineco.com> wrote:  

> > > Following this change, that particular validation no longer takes
> > > place. But, more importantly, what does it mean to fall back to
> > > "default" for this particular worktree? I'm having trouble
> > > understanding how the new behavior can be correct or desirable. (Am I
> > > missing something obvious?)  
> >
> > If the file can be missing altogether and it is not an error how it is
> > incorrect or undesirable to ignore zero size file?  
> 
> Because the _presence_ of that file indicates a linked worktree,
> whereas it's absence indicates the main worktree. If the file is
> present but empty, then that is an abnormal condition, i.e. some form
> of corruption.
> 
> The difference is significant, and that's why I'm asking if the new
> behavior is correct or desirable. If you start interpreting this
> abnormal condition as a non-error, then get_common_dir_noenv() will be
> reporting that this is the main worktree when in fact it is (a somehow
> corrupted) linked worktree. Such false reporting could trigger
> undesirable and outright wrong behavior in callers.

This is not an issue introduced with this patch, however. The worktree
is not initialized atomically. First the worktree directory is created
and then it is populated with content including the commondir reference.

Because there is no big repository lock that everyone takes to access
a repository other running git processes can see the wotktree without
the commondir file. 

The way this is mitigated in users of get_worktrees() is an assumption
that the first worktree is the main worktree.

If this is sufficient is not something this patchset aims to address.
It merely addresses get_worktrees() aborting due to hitting specific
stage in the initialization of a worktree.

Thanks

Michal
diff mbox series

Patch

diff --git a/setup.c b/setup.c
index ca9e8a949ed8..49306e36990d 100644
--- a/setup.c
+++ b/setup.c
@@ -270,12 +270,20 @@  int get_common_dir_noenv(struct strbuf *sb, const char *gitdir)
 {
 	struct strbuf data = STRBUF_INIT;
 	struct strbuf path = STRBUF_INIT;
-	int ret = 0;
+	int ret;
 
 	strbuf_addf(&path, "%s/commondir", gitdir);
-	if (file_exists(path.buf)) {
-		if (strbuf_read_file(&data, path.buf, 0) <= 0)
+	ret = strbuf_read_file(&data, path.buf, 0);
+	if (ret <= 0) {
+		/*
+		 * if file is missing or zero size (just being written)
+		 * assume default, bail otherwise
+		 */
+		if (ret && errno != ENOENT && errno != ENOTDIR)
 			die_errno(_("failed to read %s"), path.buf);
+		strbuf_addstr(sb, gitdir);
+		ret = 0;
+	} else {
 		while (data.len && (data.buf[data.len - 1] == '\n' ||
 				    data.buf[data.len - 1] == '\r'))
 			data.len--;
@@ -286,8 +294,6 @@  int get_common_dir_noenv(struct strbuf *sb, const char *gitdir)
 		strbuf_addbuf(&path, &data);
 		strbuf_add_real_path(sb, path.buf);
 		ret = 1;
-	} else {
-		strbuf_addstr(sb, gitdir);
 	}
 
 	strbuf_release(&data);