Message ID | pull.766.git.1610465492.gitgitgadget@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | Simple IPC Mechanism | expand |
On Tue, Jan 12 2021, Jeff Hostetler via GitGitGadget wrote: > This series introduces a multi-threaded IPC mechanism called "Simple IPC". > This is a library-layer feature to make it easy to create very long running > daemon/service applications and for unrelated Git commands to communicate > with them. Communication uses pkt-line messaging over a Windows named pipe > or Unix domain socket. > > On the server side, Simple IPC implements a (platform-specific) connection > listener and worker thread-pool to accept and handle a series of client > connections. The server functionality is completely hidden behind the > ipc_server_run() and ipc_server_run_async() APIs. The daemon/service > application only needs to define an application-specific callback to handle > client requests. > > Note that Simple IPC is completely unrelated to the long running process > feature (described in sub-process.h) where the lifetime of a "sub-process" > child is bound to that of the invoking parent process and communication > occurs over the child's stdin/stdout. > > Simple IPC will serve as a basis for a future builtin FSMonitor daemon > feature. I only skimmed this so far. In the past we had a git-read-cache--daemon -> git-index-helper[1] -> watchman. The last iteration of that seems to be the [3] re-roll from Ben Peart in 2017. I used/tested that for a while and had some near-production use-cases of it. How does this new series relate to that past work (if at all), and (not having re-read the old threads) were there reasons those old patch serieses weren't merged in that are addressed here, mitigated etc? 1. https://lore.kernel.org/git/1402406665-27988-1-git-send-email-pclouds@gmail.com/ 2. https://lore.kernel.org/git/1457548582-28302-1-git-send-email-dturner@twopensource.com/ 3. https://lore.kernel.org/git/20170518201333.13088-1-benpeart@microsoft.com/ 4. https://lore.kernel.org/git/87bmhfwmqa.fsf@evledraar.gmail.com/
On 1/12/21 11:50 AM, Ævar Arnfjörð Bjarmason wrote: > > On Tue, Jan 12 2021, Jeff Hostetler via GitGitGadget wrote: > >> This series introduces a multi-threaded IPC mechanism called "Simple IPC". >> This is a library-layer feature to make it easy to create very long running >> daemon/service applications and for unrelated Git commands to communicate >> with them. Communication uses pkt-line messaging over a Windows named pipe >> or Unix domain socket. >> >> On the server side, Simple IPC implements a (platform-specific) connection >> listener and worker thread-pool to accept and handle a series of client >> connections. The server functionality is completely hidden behind the >> ipc_server_run() and ipc_server_run_async() APIs. The daemon/service >> application only needs to define an application-specific callback to handle >> client requests. >> >> Note that Simple IPC is completely unrelated to the long running process >> feature (described in sub-process.h) where the lifetime of a "sub-process" >> child is bound to that of the invoking parent process and communication >> occurs over the child's stdin/stdout. >> >> Simple IPC will serve as a basis for a future builtin FSMonitor daemon >> feature. > > I only skimmed this so far. In the past we had a git-read-cache--daemon > -> git-index-helper[1] -> watchman. The last iteration of that seems to > be the [3] re-roll from Ben Peart in 2017. I used/tested that for a > while and had some near-production use-cases of it. > > How does this new series relate to that past work (if at all), and (not > having re-read the old threads) were there reasons those old patch > serieses weren't merged in that are addressed here, mitigated etc? > > 1. https://lore.kernel.org/git/1402406665-27988-1-git-send-email-pclouds@gmail.com/ > 2. https://lore.kernel.org/git/1457548582-28302-1-git-send-email-dturner@twopensource.com/ > 3. https://lore.kernel.org/git/20170518201333.13088-1-benpeart@microsoft.com/ > 4. https://lore.kernel.org/git/87bmhfwmqa.fsf@evledraar.gmail.com/ > I'm starting with the model used by the existing FSMonitor feature that Ben Peart and Kevin Willford added to Git. Item [3] looks to be an earlier draft of that effort. The idea there was to add the fsmonitor hook that could talk to a daemon like Watchman and quickly update the in-memory cache-entry flags without the need to lstat() and similarly update the untracked-cache. An index extension was added to remember the last fsmonitor response processed. Currently in Git, we have a fsmonitor hook (usually a perl script) that talks to Watchman and translates the Watchman response back into something that the Git client can understand. This comes back as a list of files that have changed since some timestamp (or in V2, relative to some daemon-specific token). Items [1,2] are not related to that. That was a different effort to quickly fetch a read-only copy of an already-parsed index via shared memory. In the last version I saw, there were 2 daemons. index-helper kept a fresh view of the index in shared memory and could give it to the Git client. The client could just mmap the pre-parsed index and avoid calling `read_index()`. Index-helper would drive Watchman to keep track of cache-entries as they changed and handle the lstat's. I'm not familiar with [4] (and I only quickly scanned it). There are several ideas for finding slow spots while reading the index. I don't want to go into all of them, but several are obsolete now. They didn't contribute to the current effort. The Simple IPC series (and a soon to be submitted fsmonitor--daemon series) are intended to be a follow on to FSMonitor effort that is currently in Git. 1. Build a git-native daemon to watch the file system and avoid needing a third-party tool. This doesn't preclude the use of Watchman, but having a builtin tool might simplify engineering support costs when deploying to a large team. 2. Use direct IPC between the Git command and the daemon to avoid the expense of the Hook API (which is expensive on Windows). 3. Make the daemon Git-aware. For example, it might want to pre-filter ignored files. (This might not be present in V1. And we might extend the daemon to do more of this as we improve performance.) Jeff
"Jeff Hostetler via GitGitGadget" <gitgitgadget@gmail.com> writes: > This series introduces a multi-threaded IPC mechanism called "Simple IPC". > This is a library-layer feature to make it easy to create very long running > daemon/service applications and for unrelated Git commands to communicate > with them. Communication uses pkt-line messaging over a Windows named pipe > or Unix domain socket. > > On the server side, Simple IPC implements a (platform-specific) connection > listener and worker thread-pool to accept and handle a series of client > connections. The server functionality is completely hidden behind the > ipc_server_run() and ipc_server_run_async() APIs. The daemon/service > application only needs to define an application-specific callback to handle > client requests. > > Note that Simple IPC is completely unrelated to the long running process > feature (described in sub-process.h) where the lifetime of a "sub-process" > child is bound to that of the invoking parent process and communication > occurs over the child's stdin/stdout. > > Simple IPC will serve as a basis for a future builtin FSMonitor daemon > feature. What kind of security implications does this bring into the system? Can a Simple IPC daemon be connected by any client? How does the daemon know that the other side is authorized to make requests? When a git binary acting as client connect to whatever happens to be listening to the well-known location, how does it know if the other side is the daemon it wanted to talk to and not a malicious MITM or other impersonator? Or is this to be only used for "this is meant to be used to give read-only access to data that is public anyway" kind of daemon, e.g. "git://" transport that serves clones and fetches? Or is this meant to be used on client machines where all the processes are assumed to be working for the end user, so it is OK to declare that anything will go (good old DOS mental model?) I know at the Mechanism level we do not yet know how it will be used, but we cannot retrofit sufficient security, so it would be necessary to know answers to these questions. Thanks.
On 1/12/21 3:01 PM, Junio C Hamano wrote: > "Jeff Hostetler via GitGitGadget" <gitgitgadget@gmail.com> writes: > >> This series introduces a multi-threaded IPC mechanism called "Simple IPC". >> This is a library-layer feature to make it easy to create very long running >> daemon/service applications and for unrelated Git commands to communicate >> with them. Communication uses pkt-line messaging over a Windows named pipe >> or Unix domain socket. >> >> On the server side, Simple IPC implements a (platform-specific) connection >> listener and worker thread-pool to accept and handle a series of client >> connections. The server functionality is completely hidden behind the >> ipc_server_run() and ipc_server_run_async() APIs. The daemon/service >> application only needs to define an application-specific callback to handle >> client requests. >> >> Note that Simple IPC is completely unrelated to the long running process >> feature (described in sub-process.h) where the lifetime of a "sub-process" >> child is bound to that of the invoking parent process and communication >> occurs over the child's stdin/stdout. >> >> Simple IPC will serve as a basis for a future builtin FSMonitor daemon >> feature. > > What kind of security implications does this bring into the system? > > Can a Simple IPC daemon be connected by any client? How does the > daemon know that the other side is authorized to make requests? > When a git binary acting as client connect to whatever happens to be > listening to the well-known location, how does it know if the other > side is the daemon it wanted to talk to and not a malicious MITM or > other impersonator? > > Or is this to be only used for "this is meant to be used to give > read-only access to data that is public anyway" kind of daemon, > e.g. "git://" transport that serves clones and fetches? > > Or is this meant to be used on client machines where all the > processes are assumed to be working for the end user, so it is OK to > declare that anything will go (good old DOS mental model?) > > I know at the Mechanism level we do not yet know how it will be > used, but we cannot retrofit sufficient security, so it would be > necessary to know answers to these questions. > > Thanks. > Good questions. Yes, this is a local-only mechanism. A local-only named pipe on Windows and a Unix domain socket on Unix. In both cases the daemon creates the pipe/socket as the foreground user (since the daemon process will be implicitly started by the first Git command that needs to talk to it). Later client process try to open the pipe/socket with RW access if they can. On Windows a local named pipe is created by the server side. It rejects remote connections. I did not put an ACL, so it should inherit the system default which grants the user RW access (since the daemon is implicitly started by the first foreground client command that needs to talk to it.) Other users in the user's group and the anonymous user should have R but not W access to it, so they could not be able to connect. The name pipe is kept in the local Named Pipe File System (NPFS) as `\\.\pipe\<unique-path>` so it is globally visible on the system, but I don't think it is a problem. On the Unix side, the socket is created inside the .git directory by the daemon. Potential clients would have to have access to the working directory and the .git directory to connect to the socket, so in normal circumstances they would be able to read everything in the WD anyway. So again, I don't think it is a problem. Alternatively, if a malicious server is started and holds the named pipe or socket: the client might be able to talk to it (assuming that bad server grants access or impersonates the rightful user). The client might not be able to tell they've been tricked, but at that point the system is already compromised. So unless I'm missing something, I think we're OK either way. Jeff
Jeff Hostetler <git@jeffhostetler.com> writes: > On Windows a local named pipe is created by the server side. It rejects > remote connections. I did not put an ACL, so it should inherit the > system default which grants the user RW access (since the daemon is > implicitly started by the first foreground client command that needs > to talk to it.) Other users in the user's group and the anonymous > user should have R but not W access to it, so they could not be able > to connect. The name pipe is kept in the local Named Pipe File System > (NPFS) as `\\.\pipe\<unique-path>` so it is globally visible on the > system, but I don't think it is a problem. It is not intuitively obvious why globalluy visible thing is OK to me, but I'll take your word for it on stuff about Windows. > On the Unix side, the socket is created inside the .git directory > by the daemon. Potential clients would have to have access to the > working directory and the .git directory to connect to the socket, > so in normal circumstances they would be able to read everything in > the WD anyway. So again, I don't think it is a problem. OK, yes, writability to .git would automatically mean that everything is a fair game to those who can talk to the daemon, so there is no new issue here, as long as the first process that creates the socket is careful not to loosen the permission. Thanks.
On 1/12/21 7:13 PM, Junio C Hamano wrote: > Jeff Hostetler <git@jeffhostetler.com> writes: > >> On Windows a local named pipe is created by the server side. It rejects >> remote connections. I did not put an ACL, so it should inherit the >> system default which grants the user RW access (since the daemon is >> implicitly started by the first foreground client command that needs >> to talk to it.) Other users in the user's group and the anonymous >> user should have R but not W access to it, so they could not be able >> to connect. The name pipe is kept in the local Named Pipe File System >> (NPFS) as `\\.\pipe\<unique-path>` so it is globally visible on the >> system, but I don't think it is a problem. > > It is not intuitively obvious why globalluy visible thing is OK to > me, but I'll take your word for it on stuff about Windows. Sorry, that's a quirk of Windows. Windows has a funky virtual drive where named pipes are stored -- kind of like a magic directory in /proc on Linux. All local named pipes have the "\\.\pipe\" path prefix. So they are globally visible as a side-effect of that "namespace" restriction. > >> On the Unix side, the socket is created inside the .git directory >> by the daemon. Potential clients would have to have access to the >> working directory and the .git directory to connect to the socket, >> so in normal circumstances they would be able to read everything in >> the WD anyway. So again, I don't think it is a problem. > > OK, yes, writability to .git would automatically mean that > everything is a fair game to those who can talk to the daemon, so > there is no new issue here, as long as the first process that > creates the socket is careful not to loosen the permission. > > Thanks. >
On Tue, Jan 12, 2021 at 06:25:20PM -0500, Jeff Hostetler wrote: > On the Unix side, the socket is created inside the .git directory > by the daemon. Potential clients would have to have access to the > working directory and the .git directory to connect to the socket, > so in normal circumstances they would be able to read everything in > the WD anyway. So again, I don't think it is a problem. Just thinking out loud, here are two potential issues with putting it in .git that we may have to deal with later: - fsmonitor is conceptually a read-only thing (i.e., it would speed up "git status", etc). And not knowing much about how it will work, I'd guess that is carried through (i.e., even though you may open the socket R/W so that you can write requests and read them back, there is no operation you can request that will overwrite data). But the running user may not have write access to .git. As long as we cleanly bail to the non-fsmonitor code paths, I don't think it's the end of the world. Those read-only users just won't get to use the speedup (and it may even be desirable). They may complain, but it is open source so the onus is on them to improve it. You will not have made anything worse. :) - repositories may be on network filesystems that do not support unix sockets. So it would be nice if there was some way to specify an alternate path to be used for the socket. Possibly one or both of: - a config option to give a root path for sockets, where Git would then canonicalize the $GIT_DIR name and use $root/$GIT_DIR for the socket. That solves the problem for a given user once for all repos. - a config option to say "use this path for the socket". This would be per-repo, but is more flexible and possibly less confusing. One final note: on some systems[1] the permissions on the socket file itself are ignored. The safe way to protect it is to make sure the permissions on the surrounding directory are what you want. See credential-cache's init_socket_directory() for an example. -Peff [1] Sorry, I don't remember which systems. This is one of those random bits of Unix lore I've carried around for 20 years, and it's entirely possible it is simply obsolete at this point.
On Wed, Jan 13 2021, Jeff King wrote: > On Tue, Jan 12, 2021 at 06:25:20PM -0500, Jeff Hostetler wrote: > >> On the Unix side, the socket is created inside the .git directory >> by the daemon. Potential clients would have to have access to the >> working directory and the .git directory to connect to the socket, >> so in normal circumstances they would be able to read everything in >> the WD anyway. So again, I don't think it is a problem. > > Just thinking out loud, here are two potential issues with putting it in > .git that we may have to deal with later: > > - fsmonitor is conceptually a read-only thing (i.e., it would speed up > "git status", etc). And not knowing much about how it will work, I'd > guess that is carried through (i.e., even though you may open the > socket R/W so that you can write requests and read them back, there > is no operation you can request that will overwrite data). But the > running user may not have write access to .git. > > As long as we cleanly bail to the non-fsmonitor code paths, I don't > think it's the end of the world. Those read-only users just won't > get to use the speedup (and it may even be desirable). They may > complain, but it is open source so the onus is on them to improve > it. You will not have made anything worse. :) > > - repositories may be on network filesystems that do not support unix > sockets. > > So it would be nice if there was some way to specify an alternate path > to be used for the socket. Possibly one or both of: > > - a config option to give a root path for sockets, where Git would > then canonicalize the $GIT_DIR name and use $root/$GIT_DIR for the > socket. That solves the problem for a given user once for all repos. > > - a config option to say "use this path for the socket". This would be > per-repo, but is more flexible and possibly less confusing. > > One final note: on some systems[1] the permissions on the socket file > itself are ignored. The safe way to protect it is to make sure the > permissions on the surrounding directory are what you want. See > credential-cache's init_socket_directory() for an example. > > -Peff > > [1] Sorry, I don't remember which systems. This is one of those random > bits of Unix lore I've carried around for 20 years, and it's > entirely possible it is simply obsolete at this point. According to StackExchange lore this seems to have been the case with 4.2 BSD & maybe something obscure like HP/UX: https://unix.stackexchange.com/questions/83032/which-systems-do-not-honor-socket-read-write-permissions I'd say it's probably safe to ignore this as a concern for new features in git in general, and certainly for something like a thing intended for a watchman-like program which users are likely to only run on modern OS's.