[v2,11/14] unix-socket: add options to unix_stream_listen()

Message ID	7a6a69dfc20c6ff190cb020931c46bf4d88bab59.1612208747.git.gitgitgadget@gmail.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <git-owner@kernel.org> Message-Id: <7a6a69dfc20c6ff190cb020931c46bf4d88bab59.1612208747.git.gitgitgadget@gmail.com> In-Reply-To: <pull.766.v2.git.1612208747.gitgitgadget@gmail.com> References: <pull.766.git.1610465492.gitgitgadget@gmail.com> <pull.766.v2.git.1612208747.gitgitgadget@gmail.com> Date: Mon, 01 Feb 2021 19:45:44 +0000 Subject: [PATCH v2 11/14] unix-socket: add options to unix_stream_listen() Fcc: Sent Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit MIME-Version: 1.0 To: git@vger.kernel.org Cc: =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason <avarab@gmail.com>, Jeff Hostetler <git@jeffhostetler.com>, Jeff King <peff@peff.net>, Chris Torek <chris.torek@gmail.com>, Jeff Hostetler <jeffhost@microsoft.com>, Jeff Hostetler <jeffhost@microsoft.com> Precedence: bulk From: Jeff Hostetler <jeffhost@microsoft.com>
Series	Simple IPC Mechanism \| expand [v2,00/14] Simple IPC Mechanism [v2,01/14] ci/install-depends: attempt to fix "brew cask" stuff [v2,02/14] pkt-line: promote static buffer in packet_write_gently() to callers [v2,03/14] pkt-line: add write_packetized_from_buf2() that takes scratch buffer [v2,04/14] pkt-line: optionally skip the flush packet in write_packetized_from_buf() [v2,05/14] pkt-line: (optionally) libify the packet readers [v2,06/14] pkt-line: accept additional options in read_packetized_to_strbuf() [v2,07/14] simple-ipc: design documentation for new IPC mechanism [v2,08/14] simple-ipc: add win32 implementation [v2,09/14] simple-ipc: add t/helper/test-simple-ipc and t0052 [v2,10/14] unix-socket: elimiate static unix_stream_socket() helper function [v2,11/14] unix-socket: add options to unix_stream_listen() [v2,12/14] unix-socket: add no-chdir option to unix_stream_listen() [v2,13/14] unix-socket: do not call die in unix_stream_connect() [v2,14/14] simple-ipc: add Unix domain socket implementation

Jeff Hostetler Feb. 1, 2021, 7:45 p.m. UTC

From: Jeff Hostetler <jeffhost@microsoft.com>

Update `unix_stream_listen()` to take an options structure to override
default behaviors.  This includes the size of the `listen()` backlog
and whether it should always unlink the socket file before trying to
create a new one.  Also eliminate calls to `die()` if it cannot create
a socket.

Normally, `unix_stream_listen()` always tries to `unlink()` the
socket-path before calling `bind()`.  If there is an existing
server/daemon already bound and listening on that socket-path, our
`unlink()` would have the effect of disassociating the existing
server's bound-socket-fd from the socket-path without notifying the
existing server.  The existing server could continue to service
existing connections (accepted-socket-fd's), but would not receive any
futher new connections (since clients rendezvous via the socket-path).
The existing server would effectively be offline but yet appear to be
active.

Furthermore, `unix_stream_listen()` creates an opportunity for a brief
race condition for connecting clients if they try to connect in the
interval between the forced `unlink()` and the subsequent `bind()` (which
recreates the socket-path that is bound to a new socket-fd in the current
process).

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 builtin/credential-cache--daemon.c |  3 ++-
 unix-socket.c                      | 28 +++++++++++++++++++++-------
 unix-socket.h                      | 14 +++++++++++++-
 3 files changed, 36 insertions(+), 9 deletions(-)

Jeff King Feb. 2, 2021, 10:14 a.m. UTC | #1

On Mon, Feb 01, 2021 at 07:45:44PM +0000, Jeff Hostetler via GitGitGadget wrote:

> From: Jeff Hostetler <jeffhost@microsoft.com>
> 
> Update `unix_stream_listen()` to take an options structure to override
> default behaviors.  This includes the size of the `listen()` backlog
> and whether it should always unlink the socket file before trying to
> create a new one.  Also eliminate calls to `die()` if it cannot create
> a socket.

I sent a follow-up on the previous patch, but I think this part about
the die() should be folded in there.

Likewise I think it would probably be easier to follow if we added the
backlog parameter and the unlink options in separate patches. The
backlog thing is small, but the unlink part is subtle and requires
explanation. That's a good sign it might do better in its own commit.

> Normally, `unix_stream_listen()` always tries to `unlink()` the
> socket-path before calling `bind()`.  If there is an existing
> server/daemon already bound and listening on that socket-path, our
> `unlink()` would have the effect of disassociating the existing
> server's bound-socket-fd from the socket-path without notifying the
> existing server.  The existing server could continue to service
> existing connections (accepted-socket-fd's), but would not receive any
> futher new connections (since clients rendezvous via the socket-path).
> The existing server would effectively be offline but yet appear to be
> active.
> 
> Furthermore, `unix_stream_listen()` creates an opportunity for a brief
> race condition for connecting clients if they try to connect in the
> interval between the forced `unlink()` and the subsequent `bind()` (which
> recreates the socket-path that is bound to a new socket-fd in the current
> process).

OK. I'm still not sure of the endgame here for writing non-racy code to
establish the socket (which is going to require either some atomic
renaming or some dot-locking in the caller).  But it's plausible to me
that this option will be a useful primitive.

The implementation looks correct, though here are a few small
observations/questions/nits:

> -int unix_stream_listen(const char *path)
> +int unix_stream_listen(const char *path,
> +		       const struct unix_stream_listen_opts *opts)
>  {
> -	int fd, saved_errno;
> +	int fd = -1;
> +	int saved_errno;
> +	int bind_successful = 0;
> +	int backlog;
>  	struct sockaddr_un sa;
>  	struct unix_sockaddr_context ctx;
>  
> -	unlink(path);
> -
>  	if (unix_sockaddr_init(&sa, path, &ctx) < 0)
>  		return -1;

We can return directly here, because we know there is nothing to clean
up. Which I thought mean that here...

> +
>  	fd = socket(AF_UNIX, SOCK_STREAM, 0);
>  	if (fd < 0)
> -		die_errno("unable to create socket");
> +		goto fail;

...we are in the same boat. We did not create a socket, so we can just
return. That makes our cleanup code a bit simpler. But we can't do that,
because unix_sockaddr_init() may have done things that need cleaning up
(like chdir). So what you have here is correct.

IMHO that is all the more reason to push this (and the similar code in
unix_stream_connect() added in patch 13) into the previous patch.

> +	if (opts->force_unlink_before_bind)
> +		unlink(path);
>  
>  	if (bind(fd, (struct sockaddr *)&sa, sizeof(sa)) < 0)
>  		goto fail;
> +	bind_successful = 1;

And this one needs to mark a flag explicitly, because we have no other
visible way of knowing we need to do the unlink. Makes sense.

> -	if (listen(fd, 5) < 0)
> +	if (opts->listen_backlog_size > 0)
> +		backlog = opts->listen_backlog_size;
> +	else
> +		backlog = 5;
> +	if (listen(fd, backlog) < 0)

The default-to-5 is a bit funny here. We already set the default to 5 in
UNIX_STREAM_LISTEN_OPTS_INIT. Should it be "0" there, so callers can
treat that as "use the default", which we fill in here? It probably
doesn't matter much in practice, but it seems cleaner to have only one
spot with the magic number.

> @@ -114,7 +125,10 @@ int unix_stream_listen(const char *path)
>  fail:
>  	saved_errno = errno;
>  	unix_sockaddr_cleanup(&ctx);
> -	close(fd);
> +	if (fd != -1)
> +		close(fd);
> +	if (bind_successful)
> +		unlink(path);
>  	errno = saved_errno;
>  	return -1;
>  }

Should we unlink before closing? I usually try to undo actions in the
reverse order that they were done. I thought at first it might even
matter here, such that we'd atomically relinquish the name without
having a moment where it still points to a closed socket (which might be
less confusing to somebody else trying to connect). But I guess there
will always be such a moment, because it's not like we would ever
accept() or service a request.

-Peff

Jeff Hostetler Feb. 5, 2021, 11:28 p.m. UTC | #2

On 2/2/21 5:14 AM, Jeff King wrote:
> On Mon, Feb 01, 2021 at 07:45:44PM +0000, Jeff Hostetler via GitGitGadget wrote:
> 
>> From: Jeff Hostetler <jeffhost@microsoft.com>
>>
>> Update `unix_stream_listen()` to take an options structure to override
>> default behaviors.  This includes the size of the `listen()` backlog
>> and whether it should always unlink the socket file before trying to
>> create a new one.  Also eliminate calls to `die()` if it cannot create
>> a socket.
> 
> I sent a follow-up on the previous patch, but I think this part about
> the die() should be folded in there.
> 
> Likewise I think it would probably be easier to follow if we added the
> backlog parameter and the unlink options in separate patches. The
> backlog thing is small, but the unlink part is subtle and requires
> explanation. That's a good sign it might do better in its own commit.

Yes, that helped having them in 2 patches each with 1 concern.

> 
>> Normally, `unix_stream_listen()` always tries to `unlink()` the
>> socket-path before calling `bind()`.  If there is an existing
>> server/daemon already bound and listening on that socket-path, our
>> `unlink()` would have the effect of disassociating the existing
>> server's bound-socket-fd from the socket-path without notifying the
>> existing server.  The existing server could continue to service
>> existing connections (accepted-socket-fd's), but would not receive any
>> futher new connections (since clients rendezvous via the socket-path).
>> The existing server would effectively be offline but yet appear to be
>> active.
>>
>> Furthermore, `unix_stream_listen()` creates an opportunity for a brief
>> race condition for connecting clients if they try to connect in the
>> interval between the forced `unlink()` and the subsequent `bind()` (which
>> recreates the socket-path that is bound to a new socket-fd in the current
>> process).
> 
> OK. I'm still not sure of the endgame here for writing non-racy code to
> establish the socket (which is going to require either some atomic
> renaming or some dot-locking in the caller).  But it's plausible to me
> that this option will be a useful primitive.

In part 14/14 in `ipc-unix-sockets.c:create_listener_socket()` I have
code in the calling layer to (try to) handle both the startup races
and basic collisions with existing long-running servers already using
the socket.

But you're right, it might be good to revisit that as a primitive at
this layer.  We only have 1 other caller right now and I don't know
enough about `credential-cache--daemon` to know if it would benefit
from this or not.

> 
> The implementation looks correct, though here are a few small
> observations/questions/nits:
> 
>> -int unix_stream_listen(const char *path)
>> +int unix_stream_listen(const char *path,
>> +		       const struct unix_stream_listen_opts *opts)
>>   {
>> -	int fd, saved_errno;
>> +	int fd = -1;
>> +	int saved_errno;
>> +	int bind_successful = 0;
>> +	int backlog;
>>   	struct sockaddr_un sa;
>>   	struct unix_sockaddr_context ctx;
>>   
>> -	unlink(path);
>> -
>>   	if (unix_sockaddr_init(&sa, path, &ctx) < 0)
>>   		return -1;
> 
> We can return directly here, because we know there is nothing to clean
> up. Which I thought mean that here...
> 
>> +
>>   	fd = socket(AF_UNIX, SOCK_STREAM, 0);
>>   	if (fd < 0)
>> -		die_errno("unable to create socket");
>> +		goto fail;
> 
> ...we are in the same boat. We did not create a socket, so we can just
> return. That makes our cleanup code a bit simpler. But we can't do that,
> because unix_sockaddr_init() may have done things that need cleaning up
> (like chdir). So what you have here is correct.
> 
> IMHO that is all the more reason to push this (and the similar code in
> unix_stream_connect() added in patch 13) into the previous patch.

Agreed.

> 
>> +	if (opts->force_unlink_before_bind)
>> +		unlink(path);
>>   
>>   	if (bind(fd, (struct sockaddr *)&sa, sizeof(sa)) < 0)
>>   		goto fail;
>> +	bind_successful = 1;
> 
> And this one needs to mark a flag explicitly, because we have no other
> visible way of knowing we need to do the unlink. Makes sense.
> 
>> -	if (listen(fd, 5) < 0)
>> +	if (opts->listen_backlog_size > 0)
>> +		backlog = opts->listen_backlog_size;
>> +	else
>> +		backlog = 5;
>> +	if (listen(fd, backlog) < 0)
> 
> The default-to-5 is a bit funny here. We already set the default to 5 in
> UNIX_STREAM_LISTEN_OPTS_INIT. Should it be "0" there, so callers can
> treat that as "use the default", which we fill in here? It probably
> doesn't matter much in practice, but it seems cleaner to have only one
> spot with the magic number.

I'll refactor this a bit.

> 
>> @@ -114,7 +125,10 @@ int unix_stream_listen(const char *path)
>>   fail:
>>   	saved_errno = errno;
>>   	unix_sockaddr_cleanup(&ctx);
>> -	close(fd);
>> +	if (fd != -1)
>> +		close(fd);
>> +	if (bind_successful)
>> +		unlink(path);
>>   	errno = saved_errno;
>>   	return -1;
>>   }
> 
> Should we unlink before closing? I usually try to undo actions in the
> reverse order that they were done. I thought at first it might even
> matter here, such that we'd atomically relinquish the name without
> having a moment where it still points to a closed socket (which might be
> less confusing to somebody else trying to connect). But I guess there
> will always be such a moment, because it's not like we would ever
> accept() or service a request.

I'm not sure it matters, but it does look better to unwind things
in reverse order.  And yes, unlinking first is a little bit safer.

> 
> -Peff
>

Jeff King Feb. 9, 2021, 4:32 p.m. UTC | #3

On Fri, Feb 05, 2021 at 06:28:13PM -0500, Jeff Hostetler wrote:

> > OK. I'm still not sure of the endgame here for writing non-racy code to
> > establish the socket (which is going to require either some atomic
> > renaming or some dot-locking in the caller).  But it's plausible to me
> > that this option will be a useful primitive.
> 
> In part 14/14 in `ipc-unix-sockets.c:create_listener_socket()` I have
> code in the calling layer to (try to) handle both the startup races
> and basic collisions with existing long-running servers already using
> the socket.

There you make a temp socket and then try to rename it into place.  But
because rename() overwrites the destination, it still seems like two
creating processes can race each other. Something like:

  0. There's no "foo" socket (or maybe there is a stale one that
     nobody's listening on).

  1. Process A wants to become the listener. So it creates foo.A.

  2. Process B likewise. It creates foo.B.

  3. Process A renames foo.A to foo. It believes it will now service
     clients.

  4. Process B renames foo.B to foo. Now process A is stranded but
     doesn't realize it.

I.e., I don't think this is much different than an unlink+create
strategy. You've eliminated the window where a process C shows up during
steps 3 and 4 and sees no socket (because somebody else is in the midst
of a non-atomic unlink+create operation). But there's no atomicity
between the "ping the socket" and "create the socket" steps.

> But you're right, it might be good to revisit that as a primitive at
> this layer.  We only have 1 other caller right now and I don't know
> enough about `credential-cache--daemon` to know if it would benefit
> from this or not.

Yeah, having seen patch 14, it looks like your only new caller always
sets the new unlink option to 1. So it might not be worth making it
optional if you don't need it (especially because the rename trick,
assuming it's portable, is superior to unlink+create; and you'd always
be fine with an unlink on the temp socket).

The call in credential-cache--daemon is definitely racy. It's pretty
much the same thing: it pings the socket to see if it's alive, but is
still susceptible to the problem above. I was was never too concerned
about it, since the whole point of the daemon is to hang around until
its contents expire. If it loses the race and nobody contacts it, the
worst case is it waits 30 seconds for somebody to give it data before
exiting. It would benefit slightly from switching to the rename
strategy, but the bigger race would remain.

-Peff

Jeff Hostetler Feb. 9, 2021, 5:39 p.m. UTC | #4

On 2/9/21 11:32 AM, Jeff King wrote:
> On Fri, Feb 05, 2021 at 06:28:13PM -0500, Jeff Hostetler wrote:
> 
>>> OK. I'm still not sure of the endgame here for writing non-racy code to
>>> establish the socket (which is going to require either some atomic
>>> renaming or some dot-locking in the caller).  But it's plausible to me
>>> that this option will be a useful primitive.
>>
>> In part 14/14 in `ipc-unix-sockets.c:create_listener_socket()` I have
>> code in the calling layer to (try to) handle both the startup races
>> and basic collisions with existing long-running servers already using
>> the socket.
> 
> There you make a temp socket and then try to rename it into place.  But
> because rename() overwrites the destination, it still seems like two
> creating processes can race each other. Something like:
> 
>    0. There's no "foo" socket (or maybe there is a stale one that
>       nobody's listening on).
> 
>    1. Process A wants to become the listener. So it creates foo.A.
> 
>    2. Process B likewise. It creates foo.B.
> 
>    3. Process A renames foo.A to foo. It believes it will now service
>       clients.
> 
>    4. Process B renames foo.B to foo. Now process A is stranded but
>       doesn't realize it.
> 

Yeah, in my version two processes could still create uniquely named
sockets and then do the rename trick.  But they capture the inode
number of the socket before they do that.  They periodically lstat
the socket to see if the inode number has changed and if so, assume
it has been stolen from them.  (A bit of a hack, I admit.)

And I was assuming that 2 servers starting at about the same time
are effectively equivalent -- it doesn't matter which one dies, since
they both should have the same amount of cached state.  Unlike the
case where a long-running server (with lots of state) is replaced by
a newcomer.

> I.e., I don't think this is much different than an unlink+create
> strategy. You've eliminated the window where a process C shows up during
> steps 3 and 4 and sees no socket (because somebody else is in the midst
> of a non-atomic unlink+create operation). But there's no atomicity
> between the "ping the socket" and "create the socket" steps.
> 
>> But you're right, it might be good to revisit that as a primitive at
>> this layer.  We only have 1 other caller right now and I don't know
>> enough about `credential-cache--daemon` to know if it would benefit
>> from this or not.
> 
> Yeah, having seen patch 14, it looks like your only new caller always
> sets the new unlink option to 1. So it might not be worth making it
> optional if you don't need it (especially because the rename trick,
> assuming it's portable, is superior to unlink+create; and you'd always
> be fine with an unlink on the temp socket).

I am wondering if we can use the .LOCK file magic to our advantage
here (in sort of an off-label use).  If we have the server create a
lockfile "<path>.LOCK" and if successful leave it open/locked for the
life of the server (rather than immediately renaming it onto <path>)
and let the normal shutdown code rollback/delete the lockfile in the
cleanup/atexit.

If the server successfully creates the lockfile, then unlink+create
the socket at <path>.

That would give us the unique/exclusive creation (on the lock) that
we need.  Then wrap that with all the edge case cleanup code to
create/delete/manage the peer socket.  Basically if the lock exists,
there should be a live server listening to the socket (unless there
was a crash...).

And yes, then I don't think I need the `preserve_existing` bit in the
opts struct.

> 
> The call in credential-cache--daemon is definitely racy. It's pretty
> much the same thing: it pings the socket to see if it's alive, but is
> still susceptible to the problem above. I was was never too concerned
> about it, since the whole point of the daemon is to hang around until
> its contents expire. If it loses the race and nobody contacts it, the
> worst case is it waits 30 seconds for somebody to give it data before
> exiting. It would benefit slightly from switching to the rename
> strategy, but the bigger race would remain.
> 
> -Peff
>

Jeff King Feb. 10, 2021, 3:55 p.m. UTC | #5

On Tue, Feb 09, 2021 at 12:39:22PM -0500, Jeff Hostetler wrote:

> Yeah, in my version two processes could still create uniquely named
> sockets and then do the rename trick.  But they capture the inode
> number of the socket before they do that.  They periodically lstat
> the socket to see if the inode number has changed and if so, assume
> it has been stolen from them.  (A bit of a hack, I admit.)

OK, that makes more sense. I saw the mention of the inode stuff in a
comment, but I didn't see it in the code (I guess if it's a periodic
check it's not in that initial socket creation function).

> And I was assuming that 2 servers starting at about the same time
> are effectively equivalent -- it doesn't matter which one dies, since
> they both should have the same amount of cached state.  Unlike the
> case where a long-running server (with lots of state) is replaced by
> a newcomer.

Yeah, I agree with that notion in general. I do think it would be easier
to reason about if the creation were truly race-proof (probably with a
dot-lock; see below), rather than the later "check if we got replaced"
thing.  OTOH, that "check" strategy covers a variety of cases (including
that somebody tried to ping us, decided we weren't alive due to a
timeout or some other system reason, and then replaced our socket).

Another strategy there could be having the daemon just decide to quit if
nobody contacts it for N time units. It is, after all, a cache. Even if
nobody replaces the socket, it probably makes sense to eventually decide
that the memory we're holding isn't going to good use.

> I am wondering if we can use the .LOCK file magic to our advantage
> here (in sort of an off-label use).  If we have the server create a
> lockfile "<path>.LOCK" and if successful leave it open/locked for the
> life of the server (rather than immediately renaming it onto <path>)
> and let the normal shutdown code rollback/delete the lockfile in the
> cleanup/atexit.
> 
> If the server successfully creates the lockfile, then unlink+create
> the socket at <path>.

I don't even think this is off-label. Though the normal use is for the
.lock file to get renamed into place as the official file, there are a
few other places where we use it solely for mutual exclusion. You just
always end with rollback_lock_file(), and never "commit" it.

So something like:

  1. Optimistically see if socket "foo" is present and accepting
     connections.

  2. If not, then take "foo.lock". If somebody else is holding it, loop
     with a timeout waiting for them to come alive.

  3. Assuming we got the lock, then either unlink+create the socket as
     "foo", or rename-into-place. I don't think it matters that much
     which.

  4. Rollback "foo.lock", unlinking it.

Then one process wins the lock and creates the socket, while any
simultaneous creators spin in step 2, and eventually connect to the
winner.

> That would give us the unique/exclusive creation (on the lock) that
> we need.  Then wrap that with all the edge case cleanup code to
> create/delete/manage the peer socket.  Basically if the lock exists,
> there should be a live server listening to the socket (unless there
> was a crash...).

I think you'd want to delete the lock as soon as you're done with the
setup. That reduces the chances that a dead server (e.g., killed by a
power outage without the chance to clean up after itself) leaves a stale
lock sitting around.

-Peff

Jeff Hostetler Feb. 10, 2021, 9:31 p.m. UTC | #6

On 2/10/21 10:55 AM, Jeff King wrote:
> On Tue, Feb 09, 2021 at 12:39:22PM -0500, Jeff Hostetler wrote:
> 
>> Yeah, in my version two processes could still create uniquely named
>> sockets and then do the rename trick.  But they capture the inode
>> number of the socket before they do that.  They periodically lstat
>> the socket to see if the inode number has changed and if so, assume
>> it has been stolen from them.  (A bit of a hack, I admit.)
> 
> OK, that makes more sense. I saw the mention of the inode stuff in a
> comment, but I didn't see it in the code (I guess if it's a periodic
> check it's not in that initial socket creation function).
> 

Yeah, there's a very slow poll(2) loop in the listen/accept thread
that watches for that and new connections (and quit messages).

>> And I was assuming that 2 servers starting at about the same time
>> are effectively equivalent -- it doesn't matter which one dies, since
>> they both should have the same amount of cached state.  Unlike the
>> case where a long-running server (with lots of state) is replaced by
>> a newcomer.
> 
> Yeah, I agree with that notion in general. I do think it would be easier
> to reason about if the creation were truly race-proof (probably with a
> dot-lock; see below), rather than the later "check if we got replaced"
> thing.  OTOH, that "check" strategy covers a variety of cases (including
> that somebody tried to ping us, decided we weren't alive due to a
> timeout or some other system reason, and then replaced our socket).
> 
> Another strategy there could be having the daemon just decide to quit if
> nobody contacts it for N time units. It is, after all, a cache. Even if
> nobody replaces the socket, it probably makes sense to eventually decide
> that the memory we're holding isn't going to good use.

I have the poll(2) loop set to recheck the inode for theft every 60
seconds (randomly chosen).

Assuming the socket isn't stolen, I want to leave any thoughts of
an auto-shutdown to the application layer above it.  My next patch
series will use this ipc mechanism to build a FSMonitor daemon that
will watch the filesystem for changes and then be able to quickly
respond to a `git status`, so it is important that it be allowed to
run without any clients for a while (such a during a build).  Yes,
memory concerns are important, so I do want it to auto-shutdown if
the socket is stolen (or the workdir is deleted).

> 
>> I am wondering if we can use the .LOCK file magic to our advantage
>> here (in sort of an off-label use).  If we have the server create a
>> lockfile "<path>.LOCK" and if successful leave it open/locked for the
>> life of the server (rather than immediately renaming it onto <path>)
>> and let the normal shutdown code rollback/delete the lockfile in the
>> cleanup/atexit.
>>
>> If the server successfully creates the lockfile, then unlink+create
>> the socket at <path>.
> 
> I don't even think this is off-label. Though the normal use is for the
> .lock file to get renamed into place as the official file, there are a
> few other places where we use it solely for mutual exclusion. You just
> always end with rollback_lock_file(), and never "commit" it.
> 
> So something like:
> 
>    1. Optimistically see if socket "foo" is present and accepting
>       connections.
> 
>    2. If not, then take "foo.lock". If somebody else is holding it, loop
>       with a timeout waiting for them to come alive.
> 
>    3. Assuming we got the lock, then either unlink+create the socket as
>       "foo", or rename-into-place. I don't think it matters that much
>       which.
> 
>    4. Rollback "foo.lock", unlinking it.
> 
> Then one process wins the lock and creates the socket, while any
> simultaneous creators spin in step 2, and eventually connect to the
> winner.
> 
>> That would give us the unique/exclusive creation (on the lock) that
>> we need.  Then wrap that with all the edge case cleanup code to
>> create/delete/manage the peer socket.  Basically if the lock exists,
>> there should be a live server listening to the socket (unless there
>> was a crash...).
> 
> I think you'd want to delete the lock as soon as you're done with the
> setup. That reduces the chances that a dead server (e.g., killed by a
> power outage without the chance to clean up after itself) leaves a stale
> lock sitting around.

Thanks this helps.  I've got a version now that is a slight variation on
what you have here that seems to work nicely and has the short-lived
lock file.  I'll post this shortly.

Jeff

[v2,11/14] unix-socket: add options to unix_stream_listen()

Commit Message

Comments

Patch