Message ID | 84df95be620c76afed73d1679722459e2ff32018.1647033303.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Builtin FSMonitor Part 2.5 | expand |
On Fri, Mar 11 2022, Jeff Hostetler via GitGitGadget wrote: > From: Jeff Hostetler <jeffhost@microsoft.com> > > fixup! fsmonitor--daemon: use a cookie file to sync with file system > > Use implicit definitions for FCIR_ enum values. > > Remove const from cookie->name. > > Reverse if then and else branches around open() to ease readability. > > Document that we don't care about errors from close() and unlink(). > > Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> > --- > builtin/fsmonitor--daemon.c | 53 +++++++++++++++++++++---------------- > 1 file changed, 30 insertions(+), 23 deletions(-) > > diff --git a/builtin/fsmonitor--daemon.c b/builtin/fsmonitor--daemon.c > index 97ca2a356e5..02a99ce98a2 100644 > --- a/builtin/fsmonitor--daemon.c > +++ b/builtin/fsmonitor--daemon.c > @@ -109,14 +109,14 @@ static int do_as_client__status(void) > > enum fsmonitor_cookie_item_result { > FCIR_ERROR = -1, /* could not create cookie file ? */ > - FCIR_INIT = 0, > + FCIR_INIT, > FCIR_SEEN, > FCIR_ABORT, > }; > > struct fsmonitor_cookie_item { > struct hashmap_entry entry; > - const char *name; > + char *name; > enum fsmonitor_cookie_item_result result; > }; > > @@ -166,37 +166,44 @@ static enum fsmonitor_cookie_item_result with_lock__wait_for_cookie( > * that the listener thread has seen it. > */ > fd = open(cookie_pathname.buf, O_WRONLY | O_CREAT | O_EXCL, 0600); > - if (fd >= 0) { > - close(fd); > - unlink(cookie_pathname.buf); > - > - /* > - * Technically, this is an infinite wait (well, unless another > - * thread sends us an abort). I'd like to change this to > - * use `pthread_cond_timedwait()` and return an error/timeout > - * and let the caller do the trivial response thing, but we > - * don't have that routine in our thread-utils. > - * > - * After extensive beta testing I'm not really worried about > - * this. Also note that the above open() and unlink() calls > - * will cause at least two FS events on that path, so the odds > - * of getting stuck are pretty slim. > - */ > - while (cookie->result == FCIR_INIT) > - pthread_cond_wait(&state->cookies_cond, > - &state->main_lock); > - } else { > + if (fd < 0) { > error_errno(_("could not create fsmonitor cookie '%s'"), > cookie->name); > > cookie->result = FCIR_ERROR; > + goto done; > } > > + /* > + * Technically, close() and unlink() can fail, but we don't > + * care here. We only created the file to trigger a watch > + * event from the FS to know that when we're up to date. > + */ > + close(fd); It still seems odd to explicitly want to ignore close() return values. I realize that we do in (too many) existing places, but why wouldn't we want to e.g. catch an I/O error here early?
On 3/14/2022 4:00 AM, Ævar Arnfjörð Bjarmason wrote: > > On Fri, Mar 11 2022, Jeff Hostetler via GitGitGadget wrote: >> + /* >> + * Technically, close() and unlink() can fail, but we don't >> + * care here. We only created the file to trigger a watch >> + * event from the FS to know that when we're up to date. >> + */ >> + close(fd); > > It still seems odd to explicitly want to ignore close() return values. > > I realize that we do in (too many) existing places, but why wouldn't we > want to e.g. catch an I/O error here early? What exactly do you propose we do here if there is an I/O error during close()? Thanks, -Stolee
Derrick Stolee <derrickstolee@github.com> writes: > On 3/14/2022 4:00 AM, Ævar Arnfjörð Bjarmason wrote: >> >> On Fri, Mar 11 2022, Jeff Hostetler via GitGitGadget wrote: > >>> + /* >>> + * Technically, close() and unlink() can fail, but we don't >>> + * care here. We only created the file to trigger a watch >>> + * event from the FS to know that when we're up to date. >>> + */ >>> + close(fd); >> >> It still seems odd to explicitly want to ignore close() return values. >> >> I realize that we do in (too many) existing places, but why wouldn't we >> want to e.g. catch an I/O error here early? > > What exactly do you propose we do here if there is an I/O error > during close()? We created the file to trigger a watch event, but now we have a reason to suspect that the wished-for watch event may not come. We only did so to know that when we're up to date. Now we may never know? We may go without realizing we are already up to date a bit longer than the reality? How much damage would it cause us to miss a watch event in this case? Very little? Is it a thing that sysadmins may care if we see too many of, but there is nothing the end user can immediately do about? If it is, perhaps a trace2 event to report it (and other "we do not care here" syscalls that fail)?
On 3/14/22 1:47 PM, Junio C Hamano wrote: > Derrick Stolee <derrickstolee@github.com> writes: > >> On 3/14/2022 4:00 AM, Ævar Arnfjörð Bjarmason wrote: >>> >>> On Fri, Mar 11 2022, Jeff Hostetler via GitGitGadget wrote: >> >>>> + /* >>>> + * Technically, close() and unlink() can fail, but we don't >>>> + * care here. We only created the file to trigger a watch >>>> + * event from the FS to know that when we're up to date. >>>> + */ >>>> + close(fd); >>> >>> It still seems odd to explicitly want to ignore close() return values. >>> >>> I realize that we do in (too many) existing places, but why wouldn't we >>> want to e.g. catch an I/O error here early? >> >> What exactly do you propose we do here if there is an I/O error >> during close()? > > We created the file to trigger a watch event, but now we have a > reason to suspect that the wished-for watch event may not come. > > We only did so to know that when we're up to date. Now we may never > know? We may go without realizing we are already up to date a bit > longer than the reality? > > How much damage would it cause us to miss a watch event in this > case? Very little? Is it a thing that sysadmins may care if we see > too many of, but there is nothing the end user can immediately do > about? If it is, perhaps a trace2 event to report it (and other "we > do not care here" syscalls that fail)? > > > The open(... O_CREAT ...) succeeded, so we actually created a new file and expect a FS event for it. That FS event (when seen by the FS listener thread) will cause our condition to be signaled and allow this thread to wake up and respond to the client. The odds of the close() failing on a plain file (after a successful open()) are very slim. And there's nothing that we can do about the failure anyway. (And we're not relying on an FS event from the close() succeeding, so it really doesn't matter.) Technically, it is possible that the daemon could run out of fd's if this close() fails often, so at some point the daemon might not be able to create new cookie files. But the daemon currently defaults to sending a trivial response to the client -- if this turns out to be a real issue, we could have the daemon restart or something, but I'm not going to worry about that right now. The odds of a failure in unlink() is a little more interesting. This would mean that a stale cookie file would be left in the cookie directory (and waste a little disk space). But that is not likely either (for a plain file that we just created). Since we're not relying on the FS event for the unlink(), the failure here won't block the current thread either. Deleting stale cookie files is something that we could try to address in the future if it turns out to be a problem. Jeff
diff --git a/builtin/fsmonitor--daemon.c b/builtin/fsmonitor--daemon.c index 97ca2a356e5..02a99ce98a2 100644 --- a/builtin/fsmonitor--daemon.c +++ b/builtin/fsmonitor--daemon.c @@ -109,14 +109,14 @@ static int do_as_client__status(void) enum fsmonitor_cookie_item_result { FCIR_ERROR = -1, /* could not create cookie file ? */ - FCIR_INIT = 0, + FCIR_INIT, FCIR_SEEN, FCIR_ABORT, }; struct fsmonitor_cookie_item { struct hashmap_entry entry; - const char *name; + char *name; enum fsmonitor_cookie_item_result result; }; @@ -166,37 +166,44 @@ static enum fsmonitor_cookie_item_result with_lock__wait_for_cookie( * that the listener thread has seen it. */ fd = open(cookie_pathname.buf, O_WRONLY | O_CREAT | O_EXCL, 0600); - if (fd >= 0) { - close(fd); - unlink(cookie_pathname.buf); - - /* - * Technically, this is an infinite wait (well, unless another - * thread sends us an abort). I'd like to change this to - * use `pthread_cond_timedwait()` and return an error/timeout - * and let the caller do the trivial response thing, but we - * don't have that routine in our thread-utils. - * - * After extensive beta testing I'm not really worried about - * this. Also note that the above open() and unlink() calls - * will cause at least two FS events on that path, so the odds - * of getting stuck are pretty slim. - */ - while (cookie->result == FCIR_INIT) - pthread_cond_wait(&state->cookies_cond, - &state->main_lock); - } else { + if (fd < 0) { error_errno(_("could not create fsmonitor cookie '%s'"), cookie->name); cookie->result = FCIR_ERROR; + goto done; } + /* + * Technically, close() and unlink() can fail, but we don't + * care here. We only created the file to trigger a watch + * event from the FS to know that when we're up to date. + */ + close(fd); + unlink(cookie_pathname.buf); + + /* + * Technically, this is an infinite wait (well, unless another + * thread sends us an abort). I'd like to change this to + * use `pthread_cond_timedwait()` and return an error/timeout + * and let the caller do the trivial response thing, but we + * don't have that routine in our thread-utils. + * + * After extensive beta testing I'm not really worried about + * this. Also note that the above open() and unlink() calls + * will cause at least two FS events on that path, so the odds + * of getting stuck are pretty slim. + */ + while (cookie->result == FCIR_INIT) + pthread_cond_wait(&state->cookies_cond, + &state->main_lock); + +done: hashmap_remove(&state->cookies, &cookie->entry, NULL); result = cookie->result; - free((char*)cookie->name); + free(cookie->name); free(cookie); strbuf_release(&cookie_pathname);