Message ID | 7a6a69dfc20c6ff190cb020931c46bf4d88bab59.1612208747.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Simple IPC Mechanism | expand |
On Mon, Feb 01, 2021 at 07:45:44PM +0000, Jeff Hostetler via GitGitGadget wrote: > From: Jeff Hostetler <jeffhost@microsoft.com> > > Update `unix_stream_listen()` to take an options structure to override > default behaviors. This includes the size of the `listen()` backlog > and whether it should always unlink the socket file before trying to > create a new one. Also eliminate calls to `die()` if it cannot create > a socket. I sent a follow-up on the previous patch, but I think this part about the die() should be folded in there. Likewise I think it would probably be easier to follow if we added the backlog parameter and the unlink options in separate patches. The backlog thing is small, but the unlink part is subtle and requires explanation. That's a good sign it might do better in its own commit. > Normally, `unix_stream_listen()` always tries to `unlink()` the > socket-path before calling `bind()`. If there is an existing > server/daemon already bound and listening on that socket-path, our > `unlink()` would have the effect of disassociating the existing > server's bound-socket-fd from the socket-path without notifying the > existing server. The existing server could continue to service > existing connections (accepted-socket-fd's), but would not receive any > futher new connections (since clients rendezvous via the socket-path). > The existing server would effectively be offline but yet appear to be > active. > > Furthermore, `unix_stream_listen()` creates an opportunity for a brief > race condition for connecting clients if they try to connect in the > interval between the forced `unlink()` and the subsequent `bind()` (which > recreates the socket-path that is bound to a new socket-fd in the current > process). OK. I'm still not sure of the endgame here for writing non-racy code to establish the socket (which is going to require either some atomic renaming or some dot-locking in the caller). But it's plausible to me that this option will be a useful primitive. The implementation looks correct, though here are a few small observations/questions/nits: > -int unix_stream_listen(const char *path) > +int unix_stream_listen(const char *path, > + const struct unix_stream_listen_opts *opts) > { > - int fd, saved_errno; > + int fd = -1; > + int saved_errno; > + int bind_successful = 0; > + int backlog; > struct sockaddr_un sa; > struct unix_sockaddr_context ctx; > > - unlink(path); > - > if (unix_sockaddr_init(&sa, path, &ctx) < 0) > return -1; We can return directly here, because we know there is nothing to clean up. Which I thought mean that here... > + > fd = socket(AF_UNIX, SOCK_STREAM, 0); > if (fd < 0) > - die_errno("unable to create socket"); > + goto fail; ...we are in the same boat. We did not create a socket, so we can just return. That makes our cleanup code a bit simpler. But we can't do that, because unix_sockaddr_init() may have done things that need cleaning up (like chdir). So what you have here is correct. IMHO that is all the more reason to push this (and the similar code in unix_stream_connect() added in patch 13) into the previous patch. > + if (opts->force_unlink_before_bind) > + unlink(path); > > if (bind(fd, (struct sockaddr *)&sa, sizeof(sa)) < 0) > goto fail; > + bind_successful = 1; And this one needs to mark a flag explicitly, because we have no other visible way of knowing we need to do the unlink. Makes sense. > - if (listen(fd, 5) < 0) > + if (opts->listen_backlog_size > 0) > + backlog = opts->listen_backlog_size; > + else > + backlog = 5; > + if (listen(fd, backlog) < 0) The default-to-5 is a bit funny here. We already set the default to 5 in UNIX_STREAM_LISTEN_OPTS_INIT. Should it be "0" there, so callers can treat that as "use the default", which we fill in here? It probably doesn't matter much in practice, but it seems cleaner to have only one spot with the magic number. > @@ -114,7 +125,10 @@ int unix_stream_listen(const char *path) > fail: > saved_errno = errno; > unix_sockaddr_cleanup(&ctx); > - close(fd); > + if (fd != -1) > + close(fd); > + if (bind_successful) > + unlink(path); > errno = saved_errno; > return -1; > } Should we unlink before closing? I usually try to undo actions in the reverse order that they were done. I thought at first it might even matter here, such that we'd atomically relinquish the name without having a moment where it still points to a closed socket (which might be less confusing to somebody else trying to connect). But I guess there will always be such a moment, because it's not like we would ever accept() or service a request. -Peff
On 2/2/21 5:14 AM, Jeff King wrote: > On Mon, Feb 01, 2021 at 07:45:44PM +0000, Jeff Hostetler via GitGitGadget wrote: > >> From: Jeff Hostetler <jeffhost@microsoft.com> >> >> Update `unix_stream_listen()` to take an options structure to override >> default behaviors. This includes the size of the `listen()` backlog >> and whether it should always unlink the socket file before trying to >> create a new one. Also eliminate calls to `die()` if it cannot create >> a socket. > > I sent a follow-up on the previous patch, but I think this part about > the die() should be folded in there. > > Likewise I think it would probably be easier to follow if we added the > backlog parameter and the unlink options in separate patches. The > backlog thing is small, but the unlink part is subtle and requires > explanation. That's a good sign it might do better in its own commit. Yes, that helped having them in 2 patches each with 1 concern. > >> Normally, `unix_stream_listen()` always tries to `unlink()` the >> socket-path before calling `bind()`. If there is an existing >> server/daemon already bound and listening on that socket-path, our >> `unlink()` would have the effect of disassociating the existing >> server's bound-socket-fd from the socket-path without notifying the >> existing server. The existing server could continue to service >> existing connections (accepted-socket-fd's), but would not receive any >> futher new connections (since clients rendezvous via the socket-path). >> The existing server would effectively be offline but yet appear to be >> active. >> >> Furthermore, `unix_stream_listen()` creates an opportunity for a brief >> race condition for connecting clients if they try to connect in the >> interval between the forced `unlink()` and the subsequent `bind()` (which >> recreates the socket-path that is bound to a new socket-fd in the current >> process). > > OK. I'm still not sure of the endgame here for writing non-racy code to > establish the socket (which is going to require either some atomic > renaming or some dot-locking in the caller). But it's plausible to me > that this option will be a useful primitive. In part 14/14 in `ipc-unix-sockets.c:create_listener_socket()` I have code in the calling layer to (try to) handle both the startup races and basic collisions with existing long-running servers already using the socket. But you're right, it might be good to revisit that as a primitive at this layer. We only have 1 other caller right now and I don't know enough about `credential-cache--daemon` to know if it would benefit from this or not. > > The implementation looks correct, though here are a few small > observations/questions/nits: > >> -int unix_stream_listen(const char *path) >> +int unix_stream_listen(const char *path, >> + const struct unix_stream_listen_opts *opts) >> { >> - int fd, saved_errno; >> + int fd = -1; >> + int saved_errno; >> + int bind_successful = 0; >> + int backlog; >> struct sockaddr_un sa; >> struct unix_sockaddr_context ctx; >> >> - unlink(path); >> - >> if (unix_sockaddr_init(&sa, path, &ctx) < 0) >> return -1; > > We can return directly here, because we know there is nothing to clean > up. Which I thought mean that here... > >> + >> fd = socket(AF_UNIX, SOCK_STREAM, 0); >> if (fd < 0) >> - die_errno("unable to create socket"); >> + goto fail; > > ...we are in the same boat. We did not create a socket, so we can just > return. That makes our cleanup code a bit simpler. But we can't do that, > because unix_sockaddr_init() may have done things that need cleaning up > (like chdir). So what you have here is correct. > > IMHO that is all the more reason to push this (and the similar code in > unix_stream_connect() added in patch 13) into the previous patch. Agreed. > >> + if (opts->force_unlink_before_bind) >> + unlink(path); >> >> if (bind(fd, (struct sockaddr *)&sa, sizeof(sa)) < 0) >> goto fail; >> + bind_successful = 1; > > And this one needs to mark a flag explicitly, because we have no other > visible way of knowing we need to do the unlink. Makes sense. > >> - if (listen(fd, 5) < 0) >> + if (opts->listen_backlog_size > 0) >> + backlog = opts->listen_backlog_size; >> + else >> + backlog = 5; >> + if (listen(fd, backlog) < 0) > > The default-to-5 is a bit funny here. We already set the default to 5 in > UNIX_STREAM_LISTEN_OPTS_INIT. Should it be "0" there, so callers can > treat that as "use the default", which we fill in here? It probably > doesn't matter much in practice, but it seems cleaner to have only one > spot with the magic number. I'll refactor this a bit. > >> @@ -114,7 +125,10 @@ int unix_stream_listen(const char *path) >> fail: >> saved_errno = errno; >> unix_sockaddr_cleanup(&ctx); >> - close(fd); >> + if (fd != -1) >> + close(fd); >> + if (bind_successful) >> + unlink(path); >> errno = saved_errno; >> return -1; >> } > > Should we unlink before closing? I usually try to undo actions in the > reverse order that they were done. I thought at first it might even > matter here, such that we'd atomically relinquish the name without > having a moment where it still points to a closed socket (which might be > less confusing to somebody else trying to connect). But I guess there > will always be such a moment, because it's not like we would ever > accept() or service a request. I'm not sure it matters, but it does look better to unwind things in reverse order. And yes, unlinking first is a little bit safer. > > -Peff >
On Fri, Feb 05, 2021 at 06:28:13PM -0500, Jeff Hostetler wrote: > > OK. I'm still not sure of the endgame here for writing non-racy code to > > establish the socket (which is going to require either some atomic > > renaming or some dot-locking in the caller). But it's plausible to me > > that this option will be a useful primitive. > > In part 14/14 in `ipc-unix-sockets.c:create_listener_socket()` I have > code in the calling layer to (try to) handle both the startup races > and basic collisions with existing long-running servers already using > the socket. There you make a temp socket and then try to rename it into place. But because rename() overwrites the destination, it still seems like two creating processes can race each other. Something like: 0. There's no "foo" socket (or maybe there is a stale one that nobody's listening on). 1. Process A wants to become the listener. So it creates foo.A. 2. Process B likewise. It creates foo.B. 3. Process A renames foo.A to foo. It believes it will now service clients. 4. Process B renames foo.B to foo. Now process A is stranded but doesn't realize it. I.e., I don't think this is much different than an unlink+create strategy. You've eliminated the window where a process C shows up during steps 3 and 4 and sees no socket (because somebody else is in the midst of a non-atomic unlink+create operation). But there's no atomicity between the "ping the socket" and "create the socket" steps. > But you're right, it might be good to revisit that as a primitive at > this layer. We only have 1 other caller right now and I don't know > enough about `credential-cache--daemon` to know if it would benefit > from this or not. Yeah, having seen patch 14, it looks like your only new caller always sets the new unlink option to 1. So it might not be worth making it optional if you don't need it (especially because the rename trick, assuming it's portable, is superior to unlink+create; and you'd always be fine with an unlink on the temp socket). The call in credential-cache--daemon is definitely racy. It's pretty much the same thing: it pings the socket to see if it's alive, but is still susceptible to the problem above. I was was never too concerned about it, since the whole point of the daemon is to hang around until its contents expire. If it loses the race and nobody contacts it, the worst case is it waits 30 seconds for somebody to give it data before exiting. It would benefit slightly from switching to the rename strategy, but the bigger race would remain. -Peff
On 2/9/21 11:32 AM, Jeff King wrote: > On Fri, Feb 05, 2021 at 06:28:13PM -0500, Jeff Hostetler wrote: > >>> OK. I'm still not sure of the endgame here for writing non-racy code to >>> establish the socket (which is going to require either some atomic >>> renaming or some dot-locking in the caller). But it's plausible to me >>> that this option will be a useful primitive. >> >> In part 14/14 in `ipc-unix-sockets.c:create_listener_socket()` I have >> code in the calling layer to (try to) handle both the startup races >> and basic collisions with existing long-running servers already using >> the socket. > > There you make a temp socket and then try to rename it into place. But > because rename() overwrites the destination, it still seems like two > creating processes can race each other. Something like: > > 0. There's no "foo" socket (or maybe there is a stale one that > nobody's listening on). > > 1. Process A wants to become the listener. So it creates foo.A. > > 2. Process B likewise. It creates foo.B. > > 3. Process A renames foo.A to foo. It believes it will now service > clients. > > 4. Process B renames foo.B to foo. Now process A is stranded but > doesn't realize it. > Yeah, in my version two processes could still create uniquely named sockets and then do the rename trick. But they capture the inode number of the socket before they do that. They periodically lstat the socket to see if the inode number has changed and if so, assume it has been stolen from them. (A bit of a hack, I admit.) And I was assuming that 2 servers starting at about the same time are effectively equivalent -- it doesn't matter which one dies, since they both should have the same amount of cached state. Unlike the case where a long-running server (with lots of state) is replaced by a newcomer. > I.e., I don't think this is much different than an unlink+create > strategy. You've eliminated the window where a process C shows up during > steps 3 and 4 and sees no socket (because somebody else is in the midst > of a non-atomic unlink+create operation). But there's no atomicity > between the "ping the socket" and "create the socket" steps. > >> But you're right, it might be good to revisit that as a primitive at >> this layer. We only have 1 other caller right now and I don't know >> enough about `credential-cache--daemon` to know if it would benefit >> from this or not. > > Yeah, having seen patch 14, it looks like your only new caller always > sets the new unlink option to 1. So it might not be worth making it > optional if you don't need it (especially because the rename trick, > assuming it's portable, is superior to unlink+create; and you'd always > be fine with an unlink on the temp socket). I am wondering if we can use the .LOCK file magic to our advantage here (in sort of an off-label use). If we have the server create a lockfile "<path>.LOCK" and if successful leave it open/locked for the life of the server (rather than immediately renaming it onto <path>) and let the normal shutdown code rollback/delete the lockfile in the cleanup/atexit. If the server successfully creates the lockfile, then unlink+create the socket at <path>. That would give us the unique/exclusive creation (on the lock) that we need. Then wrap that with all the edge case cleanup code to create/delete/manage the peer socket. Basically if the lock exists, there should be a live server listening to the socket (unless there was a crash...). And yes, then I don't think I need the `preserve_existing` bit in the opts struct. > > The call in credential-cache--daemon is definitely racy. It's pretty > much the same thing: it pings the socket to see if it's alive, but is > still susceptible to the problem above. I was was never too concerned > about it, since the whole point of the daemon is to hang around until > its contents expire. If it loses the race and nobody contacts it, the > worst case is it waits 30 seconds for somebody to give it data before > exiting. It would benefit slightly from switching to the rename > strategy, but the bigger race would remain. > > -Peff >
On Tue, Feb 09, 2021 at 12:39:22PM -0500, Jeff Hostetler wrote: > Yeah, in my version two processes could still create uniquely named > sockets and then do the rename trick. But they capture the inode > number of the socket before they do that. They periodically lstat > the socket to see if the inode number has changed and if so, assume > it has been stolen from them. (A bit of a hack, I admit.) OK, that makes more sense. I saw the mention of the inode stuff in a comment, but I didn't see it in the code (I guess if it's a periodic check it's not in that initial socket creation function). > And I was assuming that 2 servers starting at about the same time > are effectively equivalent -- it doesn't matter which one dies, since > they both should have the same amount of cached state. Unlike the > case where a long-running server (with lots of state) is replaced by > a newcomer. Yeah, I agree with that notion in general. I do think it would be easier to reason about if the creation were truly race-proof (probably with a dot-lock; see below), rather than the later "check if we got replaced" thing. OTOH, that "check" strategy covers a variety of cases (including that somebody tried to ping us, decided we weren't alive due to a timeout or some other system reason, and then replaced our socket). Another strategy there could be having the daemon just decide to quit if nobody contacts it for N time units. It is, after all, a cache. Even if nobody replaces the socket, it probably makes sense to eventually decide that the memory we're holding isn't going to good use. > I am wondering if we can use the .LOCK file magic to our advantage > here (in sort of an off-label use). If we have the server create a > lockfile "<path>.LOCK" and if successful leave it open/locked for the > life of the server (rather than immediately renaming it onto <path>) > and let the normal shutdown code rollback/delete the lockfile in the > cleanup/atexit. > > If the server successfully creates the lockfile, then unlink+create > the socket at <path>. I don't even think this is off-label. Though the normal use is for the .lock file to get renamed into place as the official file, there are a few other places where we use it solely for mutual exclusion. You just always end with rollback_lock_file(), and never "commit" it. So something like: 1. Optimistically see if socket "foo" is present and accepting connections. 2. If not, then take "foo.lock". If somebody else is holding it, loop with a timeout waiting for them to come alive. 3. Assuming we got the lock, then either unlink+create the socket as "foo", or rename-into-place. I don't think it matters that much which. 4. Rollback "foo.lock", unlinking it. Then one process wins the lock and creates the socket, while any simultaneous creators spin in step 2, and eventually connect to the winner. > That would give us the unique/exclusive creation (on the lock) that > we need. Then wrap that with all the edge case cleanup code to > create/delete/manage the peer socket. Basically if the lock exists, > there should be a live server listening to the socket (unless there > was a crash...). I think you'd want to delete the lock as soon as you're done with the setup. That reduces the chances that a dead server (e.g., killed by a power outage without the chance to clean up after itself) leaves a stale lock sitting around. -Peff
On 2/10/21 10:55 AM, Jeff King wrote: > On Tue, Feb 09, 2021 at 12:39:22PM -0500, Jeff Hostetler wrote: > >> Yeah, in my version two processes could still create uniquely named >> sockets and then do the rename trick. But they capture the inode >> number of the socket before they do that. They periodically lstat >> the socket to see if the inode number has changed and if so, assume >> it has been stolen from them. (A bit of a hack, I admit.) > > OK, that makes more sense. I saw the mention of the inode stuff in a > comment, but I didn't see it in the code (I guess if it's a periodic > check it's not in that initial socket creation function). > Yeah, there's a very slow poll(2) loop in the listen/accept thread that watches for that and new connections (and quit messages). >> And I was assuming that 2 servers starting at about the same time >> are effectively equivalent -- it doesn't matter which one dies, since >> they both should have the same amount of cached state. Unlike the >> case where a long-running server (with lots of state) is replaced by >> a newcomer. > > Yeah, I agree with that notion in general. I do think it would be easier > to reason about if the creation were truly race-proof (probably with a > dot-lock; see below), rather than the later "check if we got replaced" > thing. OTOH, that "check" strategy covers a variety of cases (including > that somebody tried to ping us, decided we weren't alive due to a > timeout or some other system reason, and then replaced our socket). > > Another strategy there could be having the daemon just decide to quit if > nobody contacts it for N time units. It is, after all, a cache. Even if > nobody replaces the socket, it probably makes sense to eventually decide > that the memory we're holding isn't going to good use. I have the poll(2) loop set to recheck the inode for theft every 60 seconds (randomly chosen). Assuming the socket isn't stolen, I want to leave any thoughts of an auto-shutdown to the application layer above it. My next patch series will use this ipc mechanism to build a FSMonitor daemon that will watch the filesystem for changes and then be able to quickly respond to a `git status`, so it is important that it be allowed to run without any clients for a while (such a during a build). Yes, memory concerns are important, so I do want it to auto-shutdown if the socket is stolen (or the workdir is deleted). > >> I am wondering if we can use the .LOCK file magic to our advantage >> here (in sort of an off-label use). If we have the server create a >> lockfile "<path>.LOCK" and if successful leave it open/locked for the >> life of the server (rather than immediately renaming it onto <path>) >> and let the normal shutdown code rollback/delete the lockfile in the >> cleanup/atexit. >> >> If the server successfully creates the lockfile, then unlink+create >> the socket at <path>. > > I don't even think this is off-label. Though the normal use is for the > .lock file to get renamed into place as the official file, there are a > few other places where we use it solely for mutual exclusion. You just > always end with rollback_lock_file(), and never "commit" it. > > So something like: > > 1. Optimistically see if socket "foo" is present and accepting > connections. > > 2. If not, then take "foo.lock". If somebody else is holding it, loop > with a timeout waiting for them to come alive. > > 3. Assuming we got the lock, then either unlink+create the socket as > "foo", or rename-into-place. I don't think it matters that much > which. > > 4. Rollback "foo.lock", unlinking it. > > Then one process wins the lock and creates the socket, while any > simultaneous creators spin in step 2, and eventually connect to the > winner. > >> That would give us the unique/exclusive creation (on the lock) that >> we need. Then wrap that with all the edge case cleanup code to >> create/delete/manage the peer socket. Basically if the lock exists, >> there should be a live server listening to the socket (unless there >> was a crash...). > > I think you'd want to delete the lock as soon as you're done with the > setup. That reduces the chances that a dead server (e.g., killed by a > power outage without the chance to clean up after itself) leaves a stale > lock sitting around. Thanks this helps. I've got a version now that is a slight variation on what you have here that seems to work nicely and has the short-lived lock file. I'll post this shortly. Jeff
diff --git a/builtin/credential-cache--daemon.c b/builtin/credential-cache--daemon.c index c61f123a3b8..4c6c89ab0de 100644 --- a/builtin/credential-cache--daemon.c +++ b/builtin/credential-cache--daemon.c @@ -203,9 +203,10 @@ static int serve_cache_loop(int fd) static void serve_cache(const char *socket_path, int debug) { + struct unix_stream_listen_opts opts = UNIX_STREAM_LISTEN_OPTS_INIT; int fd; - fd = unix_stream_listen(socket_path); + fd = unix_stream_listen(socket_path, &opts); if (fd < 0) die_errno("unable to bind to '%s'", socket_path); diff --git a/unix-socket.c b/unix-socket.c index ef2aeb46bcd..8bcef18ea55 100644 --- a/unix-socket.c +++ b/unix-socket.c @@ -88,24 +88,35 @@ int unix_stream_connect(const char *path) return -1; } -int unix_stream_listen(const char *path) +int unix_stream_listen(const char *path, + const struct unix_stream_listen_opts *opts) { - int fd, saved_errno; + int fd = -1; + int saved_errno; + int bind_successful = 0; + int backlog; struct sockaddr_un sa; struct unix_sockaddr_context ctx; - unlink(path); - if (unix_sockaddr_init(&sa, path, &ctx) < 0) return -1; + fd = socket(AF_UNIX, SOCK_STREAM, 0); if (fd < 0) - die_errno("unable to create socket"); + goto fail; + + if (opts->force_unlink_before_bind) + unlink(path); if (bind(fd, (struct sockaddr *)&sa, sizeof(sa)) < 0) goto fail; + bind_successful = 1; - if (listen(fd, 5) < 0) + if (opts->listen_backlog_size > 0) + backlog = opts->listen_backlog_size; + else + backlog = 5; + if (listen(fd, backlog) < 0) goto fail; unix_sockaddr_cleanup(&ctx); @@ -114,7 +125,10 @@ int unix_stream_listen(const char *path) fail: saved_errno = errno; unix_sockaddr_cleanup(&ctx); - close(fd); + if (fd != -1) + close(fd); + if (bind_successful) + unlink(path); errno = saved_errno; return -1; } diff --git a/unix-socket.h b/unix-socket.h index e271aeec5a0..c28372ef48e 100644 --- a/unix-socket.h +++ b/unix-socket.h @@ -1,7 +1,19 @@ #ifndef UNIX_SOCKET_H #define UNIX_SOCKET_H +struct unix_stream_listen_opts { + int listen_backlog_size; + unsigned int force_unlink_before_bind:1; +}; + +#define UNIX_STREAM_LISTEN_OPTS_INIT \ +{ \ + .listen_backlog_size = 5, \ + .force_unlink_before_bind = 1, \ +} + int unix_stream_connect(const char *path); -int unix_stream_listen(const char *path); +int unix_stream_listen(const char *path, + const struct unix_stream_listen_opts *opts); #endif /* UNIX_SOCKET_H */