Message ID | 20240704003818.750223-1-sandals@crustytoothpaste.net (mailing list archive) |
---|---|
Headers | show |
Series | Additional FAQ entries | expand |
"brian m. carlson" <sandals@crustytoothpaste.net> writes: > This series introduces some additional Git FAQ entries on various > topics. They are all things I've seen in my professional life or on > Stack Overflow, so I've written documentation. Just to help other readers: v1 https://lore.kernel.org/git/20211020010624.675562-1-sandals@crustytoothpaste.net/ v2 https://lore.kernel.org/git/20211107225525.431138-1-sandals@crustytoothpaste.net/ are where the previous discussions are found. > There were some suggestions in the past that the text "modify, tamper > with, or buffer" might be somewhat redundant, but I've chosen to keep > the text as it is to avoid arguments like, "Well, buffering the entire > request or response isn't really modifying it, so Git should just work > in that situation," when we already know that doesn't work. > > Changes from v2 (partial): > * Add documentation on proxies to the configuration documentation as > well. > * Mention some security problems that are known to occur with TLS MITM > proxies. This mirrors the similar Git LFS documentation. > * Provide a documentation example about how to use proxies with SSH. > * Recommend running a `git fsck` after syncing with rsync. > > Changes from v1: > * Drop the monorepo patch for now; I want to revise it further. > * Reorder the working tree patch to place more warnings up front. > * Mention core.gitproxy and socat. > * Rephrase text in the EOL entry to read correctly and be easier to > understand. > * Improve the commit message for the working tree FAQ entry to make it > clearer that users wish to transfer uncommitted changes. > > brian m. carlson (4): > gitfaq: add documentation on proxies > gitfaq: give advice on using eol attribute in gitattributes > gitfaq: add entry about syncing working trees > doc: mention that proxies must be completely transparent > > Documentation/config/http.txt | 5 ++ > Documentation/gitfaq.txt | 105 ++++++++++++++++++++++++++++++++-- > 2 files changed, 104 insertions(+), 6 deletions(-)
"brian m. carlson" <sandals@crustytoothpaste.net> writes: > This series introduces some additional Git FAQ entries on various > topics. They are all things I've seen in my professional life or on > Stack Overflow, so I've written documentation. > > There were some suggestions in the past that the text "modify, tamper > with, or buffer" might be somewhat redundant, but I've chosen to keep > the text as it is to avoid arguments like, "Well, buffering the entire > request or response isn't really modifying it, so Git should just work > in that situation," when we already know that doesn't work. Buffering the entire thing will break because ...? Deadlock? Or is there anything more subtle going on? Are we affected by any frame boundary (do we even notice?) that happens at layer lower than our own pkt-line layer at all (i.e. we sent two chunks and we fail to work on them correctly if the network collapses them into one chunk, without changing a single byte, just changing the number of read() system calls that reads them?)?
On 2024-07-04 at 05:22:27, Junio C Hamano wrote: > "brian m. carlson" <sandals@crustytoothpaste.net> writes: > > > This series introduces some additional Git FAQ entries on various > > topics. They are all things I've seen in my professional life or on > > Stack Overflow, so I've written documentation. > > > > There were some suggestions in the past that the text "modify, tamper > > with, or buffer" might be somewhat redundant, but I've chosen to keep > > the text as it is to avoid arguments like, "Well, buffering the entire > > request or response isn't really modifying it, so Git should just work > > in that situation," when we already know that doesn't work. > > Buffering the entire thing will break because ...? Deadlock? Or is > there anything more subtle going on? When we use the smart HTTP protocol, the server sends keep-alive and status messages as one of the data streams, which is important because (a) the user is usually impatient and wants to know what's going on and (b) it may take a long time to pack the data, especially for large repositories, and sending no data may result in the connection being dropped or the client being served a 500 by an intermediate layer. We know this does happen and I've seen reports of it. We've also seen some cases where proxies refuse to accept Transfer-Encoding: chunked (let's party like it's 1999) and send a 411 back since there's no Content-Length header. That's presumably because they want to scan the contents for "bad" data all in one chunk, but Git has to stream the contents unless the data fits in the buffer size. (This is the one case where http.postBuffer actually makes a difference.) I very much doubt that the appliance actually wants to get a 2 GiB payload to scan, since it probably doesn't have tons of memory in the first place, but that is what it's asking for. > Are we affected by any frame boundary (do we even notice?) that > happens at layer lower than our own pkt-line layer at all (i.e. we > sent two chunks and we fail to work on them correctly if the network > collapses them into one chunk, without changing a single byte, just > changing the number of read() system calls that reads them?)? No, that's not a problem. We read four bytes for the pkt-line header, and then we read the entire body based on that length until we get all of it. This is also the way OpenSSL works for TLS packets and is known to work well. If the underlying TCP connection provides a partial or incomplete packet (which can happen due to MTU), we'll just block until the rest comes in, which is fine.
"brian m. carlson" <sandals@crustytoothpaste.net> writes: >> Buffering the entire thing will break because ...? Deadlock? Or is >> there anything more subtle going on? > > When we use the smart HTTP protocol, the server sends keep-alive and > status messages as one of the data streams, which is important because > (a) the user is usually impatient and wants to know what's going on and > (b) it may take a long time to pack the data, especially for large > repositories, and sending no data may result in the connection being > dropped or the client being served a 500 by an intermediate layer. We > know this does happen and I've seen reports of it. And this is an example of "a proxy that buffers the data, without modifying or tampering with, would still break transport"? > We've also seen some cases where proxies refuse to accept > Transfer-Encoding: chunked (let's party like it's 1999) and send a 411 > back since there's no Content-Length header. This is "a proxy that wanted to buffer the data but failed to do so" that ended up modifying the data Gits sitting at both ends of the connection can observe, so it is a bit different issue. It clearly falls into "modify or tampering with" category. I forgot to say this clearly when I wrote the message you are responding to, but I am trying to see if we can clarify the "or buffer" part in "modify, tamper with, or buffer", as offhand I did not think of a reason why a proxy would break the Git communication if it receives a segment that was 2MB originally from upload-pack, and forwards the contents of the segment in two 1MB segments without tampering or modifying the payload bytes at all to fetch-pack. Thanks.
On Thu, Jul 04, 2024 at 09:23:28PM +0000, brian m. carlson wrote: > > Buffering the entire thing will break because ...? Deadlock? Or is > > there anything more subtle going on? > > When we use the smart HTTP protocol, the server sends keep-alive and > status messages as one of the data streams, which is important because > (a) the user is usually impatient and wants to know what's going on and > (b) it may take a long time to pack the data, especially for large > repositories, and sending no data may result in the connection being > dropped or the client being served a 500 by an intermediate layer. We > know this does happen and I've seen reports of it. Additionally, I think for non-HTTP transports (think proxying ssh through socat or similar), buffering the v0 protocol is likely a total disaster. The fetch protocol assumes both sides spewing at each other in real time. HTTP, even v0, follows a request/response model, so we're safer there. I do think some amount of buffering is often going to be OK in practice. You'd get delayed keep-alives and progress reports, which may range from "annoying" to "something in the middle decided to time out". So I'm OK with just telling people "make sure your proxies aren't buffering" as a general rule, rather than trying to get into the nitty gritty of what is going to break and how. -Peff
Jeff King <peff@peff.net> writes: > On Thu, Jul 04, 2024 at 09:23:28PM +0000, brian m. carlson wrote: > >> > Buffering the entire thing will break because ...? Deadlock? Or is >> > there anything more subtle going on? >> >> When we use the smart HTTP protocol, the server sends keep-alive and >> status messages as one of the data streams, which is important because >> (a) the user is usually impatient and wants to know what's going on and >> (b) it may take a long time to pack the data, especially for large >> repositories, and sending no data may result in the connection being >> dropped or the client being served a 500 by an intermediate layer. We >> know this does happen and I've seen reports of it. > > Additionally, I think for non-HTTP transports (think proxying ssh > through socat or similar), buffering the v0 protocol is likely a total > disaster. The fetch protocol assumes both sides spewing at each other in > real time. Yeah, beyond one "window" that a series of "have"s are allowed to be in flight, no further "have"s are sent before seeing an "ack/nack" response, so if you buffer too much, they can deadlock fairly easily. > ... So I'm OK > with just telling people "make sure your proxies aren't buffering" as a > general rule, rather than trying to get into the nitty gritty of what is > going to break and how. Sounds fair. Thanks.
On 2024-07-06 at 05:59:57, Junio C Hamano wrote: > "brian m. carlson" <sandals@crustytoothpaste.net> writes: > > >> Buffering the entire thing will break because ...? Deadlock? Or is > >> there anything more subtle going on? > > > > When we use the smart HTTP protocol, the server sends keep-alive and > > status messages as one of the data streams, which is important because > > (a) the user is usually impatient and wants to know what's going on and > > (b) it may take a long time to pack the data, especially for large > > repositories, and sending no data may result in the connection being > > dropped or the client being served a 500 by an intermediate layer. We > > know this does happen and I've seen reports of it. > > And this is an example of "a proxy that buffers the data, without > modifying or tampering with, would still break transport"? Yes. The connection usually ends up dropped from the view of the client, which is hard to debug (because it also looks like a network problem, except often without any output from the remote side).