Message ID | cover.1589816718.git.liu.denton@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | remote-curl: fix deadlocks when remote server disconnects | expand |
On Mon, May 18, 2020 at 11:47:17AM -0400, Denton Liu wrote: > Changes since v1: > > * Remove fallthrough in switch in favour of just extracting the common > call out of the switch in patch 3 > > * Add more detail in function comment and use `const char linelen[4]` in > patch 4 > > * Implement most of Peff's suggestions[0] in patch 5 > > * Only operate on stateless_connect() in patch 5 > > * Add tests in patch 5 > > * Drop "remote-curl: ensure last packet is a flush" in favour of > "stateless-connect: send response end packet" Overall this looks pretty cleanly done. I left a few minor comments throughout, but the real question is whether we prefer the "0002" packet in the last one, or if we instead insist that the response end in a flush. At first glance, the "0002" seems like it's more flexible, because we're making fewer assumptions about what's being transferred over the stateless-connect channel. But in reality it still has to be pktlines (because we're checking them for incomplete or invalid packets already). So all it really buys us is that the server response doesn't have to end with a flush packet. So I dunno. The "0002" solution is slightly more flexible, but I'm not sure it helps in practice. And it does eat up one of our two remaining special packet markers. There is another solution, which would allow arbitrary data over stateless-connect: adding an extra level of pktline framing between the helper and the parent process. But that's rather ugly (inner pktlines may be split across outer ones, so you have to do a bunch of buffer reassembly). I think that's actually how v0 http works, IIRC. IMHO it probably isn't worth pursuing. That extra layer wrecks the elegance to the v2 stateless-connect approach, and we really do expect only pktlines to go over it. So I think either of your solutions (enforcing a final flush, or the 0002 packet) is preferable. I'm on the fence between them. -Peff
Hi Peff, On Mon, May 18, 2020 at 12:50:56PM -0400, Jeff King wrote: > On Mon, May 18, 2020 at 11:47:17AM -0400, Denton Liu wrote: > > > Changes since v1: > > > > * Remove fallthrough in switch in favour of just extracting the common > > call out of the switch in patch 3 > > > > * Add more detail in function comment and use `const char linelen[4]` in > > patch 4 > > > > * Implement most of Peff's suggestions[0] in patch 5 > > > > * Only operate on stateless_connect() in patch 5 > > > > * Add tests in patch 5 > > > > * Drop "remote-curl: ensure last packet is a flush" in favour of > > "stateless-connect: send response end packet" > > Overall this looks pretty cleanly done. I left a few minor comments > throughout, but the real question is whether we prefer the "0002" packet > in the last one, or if we instead insist that the response end in a > flush. Thanks for the prompt review! > At first glance, the "0002" seems like it's more flexible, because we're > making fewer assumptions about what's being transferred over the > stateless-connect channel. But in reality it still has to be pktlines > (because we're checking them for incomplete or invalid packets already). > So all it really buys us is that the server response doesn't have to end > with a flush packet. > > So I dunno. The "0002" solution is slightly more flexible, but I'm not > sure it helps in practice. And it does eat up one of our two remaining > special packet markers. Yeah, I was worried about consuming a special packet. One alternative that I considered but is kind of gross is sending something like "0028gitremote-helper: response complete\n" instead of "0002". Then, instead of "0002" checks, we can check for that special string instead. I don't _think_ that stateless-connect currently allows for completely arbitrary data but I might be mistaken. > There is another solution, which would allow arbitrary data over > stateless-connect: adding an extra level of pktline framing between the > helper and the parent process. But that's rather ugly (inner pktlines > may be split across outer ones, so you have to do a bunch of buffer > reassembly). I think that's actually how v0 http works, IIRC. > IMHO it probably isn't worth pursuing. That extra layer wrecks the > elegance to the v2 stateless-connect approach, and we really do expect > only pktlines to go over it. This was the original approach that I was working on but I found it to be much too invasive for my liking. (Also, I never actually managed to get it working ;) ) I think I gave up when I realised I had to insert reframing logic into index-pack and unpack-objects. > So I think either of your solutions (enforcing a final flush, or the > 0002 packet) is preferable. I'm on the fence between them. I'm mostly on the fence too. One advantage of 0002, however, is that a malicious server can't end a request with 0002 as that's explicitly prevented. If a malicious server closes a connection after sending a 0000, I think that they could cause a deadlock to happen if there are multiple flush packets expected in a response. You mentioned in a sibling email that this currently doesn't happen wrt stateless-connect although I'm not sure if in the future, this is something that might change. I dunno. > -Peff
Jeff King <peff@peff.net> writes: > Overall this looks pretty cleanly done. I left a few minor comments > throughout, but the real question is whether we prefer the "0002" packet > in the last one, or if we instead insist that the response end in a > flush. Thanks for a review. I was reading the series through and I found it a reasonably pleasant read. > So I think either of your solutions (enforcing a final flush, or the > 0002 packet) is preferable. I'm on the fence between them. I am undecided, too X-<.
On Mon, May 18, 2020 at 01:36:52PM -0400, Denton Liu wrote: > > So I dunno. The "0002" solution is slightly more flexible, but I'm not > > sure it helps in practice. And it does eat up one of our two remaining > > special packet markers. > > Yeah, I was worried about consuming a special packet. One alternative > that I considered but is kind of gross is sending something like > "0028gitremote-helper: response complete\n" instead of "0002". Then, > instead of "0002" checks, we can check for that special string instead. > I don't _think_ that stateless-connect currently allows for completely > arbitrary data but I might be mistaken. Yeah, I think we should avoid using any packet that could be confused with regular output. A well-functioning server should have sent a flush already, and so the worst case would probably be that we would mistake their "0028" packet for ours if the network happens to cut out right at that moment (and even that is obviously exceedingly unlikely). But it just seems easier to reason about if we know we can't get confused. > This was the original approach that I was working on but I found it to > be much too invasive for my liking. (Also, I never actually managed to > get it working ;) ) I think I gave up when I realised I had to insert > reframing logic into index-pack and unpack-objects. Yuck. :) I think v0 does it by unwrapping the extra layer in fetch-pack, but I'd have to double check. But I'm not going to, because I think we both agree that your other approaches are way nicer. > > So I think either of your solutions (enforcing a final flush, or the > > 0002 packet) is preferable. I'm on the fence between them. > > I'm mostly on the fence too. One advantage of 0002, however, is that a > malicious server can't end a request with 0002 as that's explicitly > prevented. If a malicious server closes a connection after sending a > 0000, I think that they could cause a deadlock to happen if there are > multiple flush packets expected in a response. You mentioned in a > sibling email that this currently doesn't happen wrt stateless-connect > although I'm not sure if in the future, this is something that might > change. I dunno. Yeah, agreed that the efficacy of the "must end in flush" strategy depends on there not being internal flushes. At least against a determined attacker. But it may also be less random than you might think, if the pattern is to send a flush, then do a bunch of work, etc. A "random" hangup or error might be more likely to happen at the flush points then (from the client's perspective). So let's see if we can answer that question with less hand-waving. The outer v2 capabilities conversation only writes out the capabilities list, followed by a single flush (the packet_flush() call in advertise_capabilities()). So far so good. The only two v2 commands defined in serve.c are "ls-refs" and "fetch". For "ls-refs", we end up in ls_refs(). That writes only regular packets via send_ref(), and then concludes with a single flush (we don't even send a delim; the client does that to specify options). Good. For "fetch", we end up in upload_pack_v2(). We write via process_haves_and_send_acks(), which will either: 1. Write nothing and return 1. 2. Write a delim and return 1. 3. Write a flush packet and return 0. In the final case, we jump to FETCH_DONE. So this really is the final thing we say. Good. In the first two cases, we jump to FETCH_SEND_PACK. We may write out wanted-ref info, followed by a delim. OK. Then shallow info, followed by a delim. OK. Then the actual packfile via create_pack_file(). There our write behavior depends on use_sideband. If not set, we just dump data directly. If set, then we packetize. And it will always be set for v2 (we do it unconditionally in the beginning of upload_pack_v2()). So we only need to care about use_sideband cases. And there we may send keepalives, but those are "0005\1" empty data packets. OK. We may send data via send_client_data(), but those will be real data packets. And then finally we give a single packet_flush(). Good. So I think we can say that yes, the protocol as designed will send a flush at the end of every response, and will not ever send another flush. _But_ that's if all goes well. If upload-pack sees an error, it may: - write an ERR packet in the earlier non-sideband steps, and then die - write a sideband 3 packet in later steps, and then die - just die() in some cases (e.g., pack-objects failed to start) Obviously these are all an error on the client. And that third one especially is probably a hanging case that we'd like to fix with this series. But in the other two, we'd definitely want to make sure that remote-curl doesn't complain too early or too loudly, and that the error is shown to the user. I think if remote-curl is just asking "did we see a flush packet at the end" then it is likely to complain unnecessarily in that case. And I don't think it should be inspecting packets for ERR or sideband-3. It doesn't know enough about the rest of the conversation to know which is correct. Though I guess we'd really only have to inspect the final packet. I really wish we had converted all ERRs into packet 0002 as part of the v2 conversion. Then somebody reading only at the pktline level could truly understand or proxy a full conversation without understanding anything about what's in the pktlines. But it's too late for that now. So I think our options are probably: 1. detect flush packets in remote-curl, and either: a. don't print an error, just hang up. That prevents a hang in the caller and produces no extra message on a real error. It may be less informative than it could be if the connection hangs up (though we may print a curl error message, and the caller will at least say "the helper hung up") b. like (a), but always print an error; this is your original patch, but I _suspect_ (but didn't test) that it would produce extra useless messages for errors the server reports c. between the two: inspect the final packet data for evidence of ERR/sideband 3 and suppress any message if found 2. helper signals end-of-response to caller (then it never produces a message itself; only the caller does, and it would abort on an ERR packet before then) a. using a special pktline (your "0002" patch) b. some other out-of-band mechanism (e.g., could be another fd) I think this is pushing me towards 2a, your "0002" patch. It sidesteps the error-message questions entirely (and I think 2b is too convoluted to be worth pursuing, especially on Windows where setting up extra pipes is tricky). But I'd also be OK with 1a or 1c. -Peff
Jeff King <peff@peff.net> writes: > So I think our options are probably: > > 1. detect flush packets in remote-curl, and either: > > a. don't print an error, just hang up. That prevents a hang in the > caller and produces no extra message on a real error. It may be > less informative than it could be if the connection hangs up > (though we may print a curl error message, and the caller will > at least say "the helper hung up") > > b. like (a), but always print an error; this is your original > patch, but I _suspect_ (but didn't test) that it would produce > extra useless messages for errors the server reports > > c. between the two: inspect the final packet data for evidence of > ERR/sideband 3 and suppress any message if found > > 2. helper signals end-of-response to caller (then it never produces a > message itself; only the caller does, and it would abort on an ERR > packet before then) > > a. using a special pktline (your "0002" patch) > > b. some other out-of-band mechanism (e.g., could be another fd) > > I think this is pushing me towards 2a, your "0002" patch. It sidesteps > the error-message questions entirely (and I think 2b is too convoluted > to be worth pursuing, especially on Windows where setting up extra pipes > is tricky). But I'd also be OK with 1a or 1c. Thanks for a detailed analysis. I guess we'd take 0002, then?
On Mon, May 18, 2020 at 03:52:42PM -0700, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > So I think our options are probably: > > > > 1. detect flush packets in remote-curl, and either: > > > > a. don't print an error, just hang up. That prevents a hang in the > > caller and produces no extra message on a real error. It may be > > less informative than it could be if the connection hangs up > > (though we may print a curl error message, and the caller will > > at least say "the helper hung up") > > > > b. like (a), but always print an error; this is your original > > patch, but I _suspect_ (but didn't test) that it would produce > > extra useless messages for errors the server reports > > > > c. between the two: inspect the final packet data for evidence of > > ERR/sideband 3 and suppress any message if found > > > > 2. helper signals end-of-response to caller (then it never produces a > > message itself; only the caller does, and it would abort on an ERR > > packet before then) > > > > a. using a special pktline (your "0002" patch) > > > > b. some other out-of-band mechanism (e.g., could be another fd) > > > > I think this is pushing me towards 2a, your "0002" patch. It sidesteps > > the error-message questions entirely (and I think 2b is too convoluted > > to be worth pursuing, especially on Windows where setting up extra pipes > > is tricky). But I'd also be OK with 1a or 1c. > > Thanks for a detailed analysis. I guess we'd take 0002, then? Yeah, that's how I lean. I think I did have a few comments on the patch that I'd like Denton to consider, so hopefully we'll see a re-roll. -Peff