Message ID | 20240523131926.1959245-1-tom@compton.nu (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | promisor-remote: add promisor.quiet configuration option | expand |
Tom Hughes <tom@compton.nu> writes: > Add a configuration optione to allow output from the promisor > fetching objects to be suppressed/ "optione" -> "option", "suppressed/" -> "suppressed.". > This allows us to stop commands like git blame being swamped > with progress messages and gc notifications from the promisor > when used in a partial clone. "git blame" -> "'git blame'", perhaps. It is an interesting observation. I thought "git blame" was quite bad at streaming (i.e., until it learned the origin of each and every line, it never produced any output the user asked for), which actually would make it a non issue that the output the user wanted gets mixed with the progress messages and other garbage. Unless the user understands that "git blame" is not spending time itself, but is waiting for necessary blobs to be fetched from the promisor, and is expected to wait unusally longer than the fully local case, having to stare at a blank/unchanging screen would make it uneasy for the end-user and that is why we have progress eye-candy. I am OK for promisor.quiet being optional, but I am torn when I imagine what comes next. On one hand, I myself probably would find it neat to make these lazy fetches happen completely silently as if nothing strange is happening from the point of view of end-users (except for some operations may be unusually slow compared to fully local repository). On the other hand, I suspect people will be tempted to push it to be on by default at which time it may hurt unsuspecting (new) users who may have been helped by progress bars. > diff --git a/Documentation/config/promisor.txt b/Documentation/config/promisor.txt > new file mode 100644 > index 0000000000..98c5cb2ec2 > --- /dev/null > +++ b/Documentation/config/promisor.txt > @@ -0,0 +1,3 @@ > +promisor.quiet:: > + If set to "true" assume `--quiet` when fetching additional > + objects for a partial clone. OK. > diff --git a/promisor-remote.c b/promisor-remote.c The implementation is absolutely trivial and straight-forward. > +test_expect_success TTY 'promisor.quiet=false works' ' Do not say "works"---recall the best practice of writing a good bug reports. Stating your expectation more explicitly, e.g. "shows progress messages" or somesuch. > + rm -rf server server2 repo && > + rm -rf server server3 repo && Why remove the same thing twice? > + test_create_repo server && > + test_commit -C server foo && > + git -C server repack -a -d --write-bitmap-index && > + > + git clone "file://$(pwd)/server" repo && > + git hash-object repo/foo.t >blobhash && Do you need a temporary file, or would blobhash=$(git hash-object repo/foo.t) && work just fine? Of course you'd later have to say ... git -C repo cat-file -p $blobhash instead of "$(cat blobhash)". Even simpler, I wonder if you can remove this hash-object invocation, and then do ... git -C repo cat-file -p :foo.t > + rm -rf repo/.git/objects/* && This! IS! BAD! for the reason stated later ... > + git -C server config uploadpack.allowanysha1inwant 1 && > + git -C server config uploadpack.allowfilter 1 && ... but these are OK and expected, ... > + git -C repo config core.repositoryformatversion 1 && > + git -C repo config extensions.partialclone "origin" && ... and this is way too different from what would happen in the real life. I'd prefer not to see manual destruction of $GIT_DIR/objects/* or manual futzing of repository format version and extensions. These configuration variables are *NOT* for end-users to futz with, and the tests should not be doing so either. Can't we prepare the "repo" only by creating a partial clone in an usual way? > + git -C repo config promisor.quiet "false" && This of course is good, as this is what the test wants to check. > + test_terminal git -C repo cat-file -p $(cat blobhash) 2>err && It seems that exactly the same set of comments apply to the next one, so I'll refrain from repeating myself. Thanks. > +test_expect_success TTY 'promisor.quiet=true works' ' > + rm -rf server server2 repo && > + rm -rf server server3 repo && > + test_create_repo server && > + test_commit -C server foo && > + git -C server repack -a -d --write-bitmap-index && > + > + git clone "file://$(pwd)/server" repo && > + git hash-object repo/foo.t >blobhash && > + rm -rf repo/.git/objects/* && > + > + git -C server config uploadpack.allowanysha1inwant 1 && > + git -C server config uploadpack.allowfilter 1 && > + git -C repo config core.repositoryformatversion 1 && > + git -C repo config extensions.partialclone "origin" && > + git -C repo config promisor.quiet "true" && > + > + test_terminal git -C repo cat-file -p $(cat blobhash) 2>err && > + > + # Ensure that no progress messages are written > + ! grep "Receiving objects" err > +' > + > . "$TEST_DIRECTORY"/lib-httpd.sh > start_httpd
On 23/05/2024 23:23, Junio C Hamano wrote: > It is an interesting observation. I thought "git blame" was quite > bad at streaming (i.e., until it learned the origin of each and > every line, it never produced any output the user asked for), which > actually would make it a non issue that the output the user wanted > gets mixed with the progress messages and other garbage. Unless the > user understands that "git blame" is not spending time itself, but > is waiting for necessary blobs to be fetched from the promisor, and > is expected to wait unusally longer than the fully local case, > having to stare at a blank/unchanging screen would make it uneasy > for the end-user and that is why we have progress eye-candy. Blame actually has it's own progress message that counts the number of lines analysed which gets interrupted by the progress messages from the promisor. Something like "git log -S" behaves a bit differently - it doesn't have progress and because it's using a pager by default that causes the promisor progress to be suppressed because stderr is no longer a terminal but you do still get lots of background gc notifications. > I am OK for promisor.quiet being optional, but I am torn when I > imagine what comes next. On one hand, I myself probably would find > it neat to make these lazy fetches happen completely silently as if > nothing strange is happening from the point of view of end-users > (except for some operations may be unusually slow compared to fully > local repository). On the other hand, I suspect people will be > tempted to push it to be on by default at which time it may hurt > unsuspecting (new) users who may have been helped by progress bars. I do agree that it's hard to know what the right thing to do is here or even to know the full scope of the effect. I'll update the patch to address the specific review comments. Tom
On Thu, May 23, 2024 at 02:19:26PM +0100, Tom Hughes wrote: > Add a configuration optione to allow output from the promisor > fetching objects to be suppressed/ > > This allows us to stop commands like git blame being swamped > with progress messages and gc notifications from the promisor > when used in a partial clone. I'm not at all opposed to providing a way to suppress this, but I feel like in the long run, the more fundamental issue is that git-blame kicks off a zillion fetches as it traverses. That's not only ugly but it's also horribly inefficient. In an ideal world we'd queue all of the blobs we need, do a single fetch, and then compute the blame on the result. That's probably easier said than done, though we have done it in other spots (e.g., for checkout). In terms of user experience, you can simulate it with something like: # fault in all of the necessary blobs in one batch git rev-list HEAD -- $file | git diff-tree --stdin --format= -r --diff-filter=d -m --raw -- $file | awk '{print $4}' | git -c fetch.negotiationAlgorithm=noop \ fetch --no-tags --no-write-fetch-head --recurse-submodules=no \ --filter=blob:none --stdin git blame $file Obviously that command is horrid and not something users should have to care about. But if we had some way for blame to say "hey, I am traversing from X..Y, looking at these pathspecs", then our first lazy-fetch could try to grab all of them. And I think the same would be the case for "git log -p", and so on. Doing a separate traversal isn't maximally efficient, but it might not be too bad in practice (and we could even do partial traversals to balance chunking versus responsiveness, though in the case of non-incremental blame we need everything before we generate an answer anyway). But anyway, I bring it up here because I think once we reach that end state, it won't be as interesting to turn off the fetch progress. -Peff
On 25/05/2024 06:29, Jeff King wrote: > On Thu, May 23, 2024 at 02:19:26PM +0100, Tom Hughes wrote: > >> Add a configuration optione to allow output from the promisor >> fetching objects to be suppressed/ >> >> This allows us to stop commands like git blame being swamped >> with progress messages and gc notifications from the promisor >> when used in a partial clone. > > I'm not at all opposed to providing a way to suppress this, but I feel > like in the long run, the more fundamental issue is that git-blame kicks > off a zillion fetches as it traverses. That's not only ugly but it's > also horribly inefficient. This is true. One thing I found that makes things a lot more efficient if you're using ssh as the transport is to enable persistent multiplexing in .ssh/config with something like: Host git.example.com ControlMaster auto ControlPath /run/user/%i/ssh/control.%C ControlPersist 1m SendEnv GIT_PROTOCOL which avoids each fetch having to setup and authenticate a new ssh session. > In an ideal world we'd queue all of the blobs we need, do a single > fetch, and then compute the blame on the result. That's probably easier > said than done, though we have done it in other spots (e.g., for > checkout). That would certainly be an excellent improvement, yes. Tom
On Sat, May 25, 2024 at 11:29:13AM +0100, Tom Hughes wrote: > > I'm not at all opposed to providing a way to suppress this, but I feel > > like in the long run, the more fundamental issue is that git-blame kicks > > off a zillion fetches as it traverses. That's not only ugly but it's > > also horribly inefficient. > > This is true. One thing I found that makes things a lot more > efficient if you're using ssh as the transport is to enable > persistent multiplexing in .ssh/config with something like: > > Host git.example.com > ControlMaster auto > ControlPath /run/user/%i/ssh/control.%C > ControlPersist 1m > SendEnv GIT_PROTOCOL > > which avoids each fetch having to setup and authenticate a > new ssh session. Good point. That is sort of the opposite approach of my suggestion. That is, I was suggesting that git-blame batch everything to make a single efficient request. But if we could reduce the cost of making individual requests, then we wouldn't need to batch (which is quite a lot simpler). The ssh session is going to be one source of latency and overhead. But just spawning the fetch and remote upload-pack are another (especially if you have to authenticate, and especially with the v2 protocol, which has an extra round-trip for capabilities upgrade). If there was a long-running mode to git-fetch where it kept open a v2 session to the server and just said "hey, send me object X" and then "OK, now send me object Y" that would eliminate all of that overhead (and even for http, under the hood curl is good at keeping the session open between requests). You'd still have some extra latency (while you're talking to the server, the local blame process is paused), but I suspect it would be a lot more tolerable. And now your progress question is re-opened again. You might want a more succinct progress for something like blame that still does all of its fetching before generating output. E.g., you might want a single progress line with the current state (fetching or not), the count of fetched objects, the speed, and so on. And for something like "git log -p", where the progress would be interspersed with actual output, you might want to suppress it entirely. So yeah, I have no real objection to what your patch is doing. Depending on how future work unfolds it might be more or less useful than it is now, but even in the worst case it probably won't be a bad thing to have in our toolbox. -Peff
diff --git a/Documentation/config.txt b/Documentation/config.txt index 70b448b132..6cae835db9 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -487,6 +487,8 @@ include::config/pager.txt[] include::config/pretty.txt[] +include::config/promisor.txt[] + include::config/protocol.txt[] include::config/pull.txt[] diff --git a/Documentation/config/promisor.txt b/Documentation/config/promisor.txt new file mode 100644 index 0000000000..98c5cb2ec2 --- /dev/null +++ b/Documentation/config/promisor.txt @@ -0,0 +1,3 @@ +promisor.quiet:: + If set to "true" assume `--quiet` when fetching additional + objects for a partial clone. diff --git a/promisor-remote.c b/promisor-remote.c index b414922c44..2ca7c2ae48 100644 --- a/promisor-remote.c +++ b/promisor-remote.c @@ -23,6 +23,7 @@ static int fetch_objects(struct repository *repo, struct child_process child = CHILD_PROCESS_INIT; int i; FILE *child_in; + int quiet; if (git_env_bool(NO_LAZY_FETCH_ENVIRONMENT, 0)) { static int warning_shown; @@ -41,6 +42,8 @@ static int fetch_objects(struct repository *repo, "fetch", remote_name, "--no-tags", "--no-write-fetch-head", "--recurse-submodules=no", "--filter=blob:none", "--stdin", NULL); + if (!git_config_get_bool("promisor.quiet", &quiet) && quiet) + strvec_push(&child.args, "--quiet"); if (start_command(&child)) die(_("promisor-remote: unable to fork off fetch subprocess")); child_in = xfdopen(child.in, "w"); diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh index 88a66f0904..99257c3792 100755 --- a/t/t0410-partial-clone.sh +++ b/t/t0410-partial-clone.sh @@ -3,6 +3,7 @@ test_description='partial clone' . ./test-lib.sh +. "$TEST_DIRECTORY"/lib-terminal.sh # missing promisor objects cause repacks which write bitmaps to fail GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 @@ -689,6 +690,52 @@ test_expect_success 'lazy-fetch when accessing object not in the_repository' ' ! grep "[?]$FILE_HASH" out ' +test_expect_success TTY 'promisor.quiet=false works' ' + rm -rf server server2 repo && + rm -rf server server3 repo && + test_create_repo server && + test_commit -C server foo && + git -C server repack -a -d --write-bitmap-index && + + git clone "file://$(pwd)/server" repo && + git hash-object repo/foo.t >blobhash && + rm -rf repo/.git/objects/* && + + git -C server config uploadpack.allowanysha1inwant 1 && + git -C server config uploadpack.allowfilter 1 && + git -C repo config core.repositoryformatversion 1 && + git -C repo config extensions.partialclone "origin" && + git -C repo config promisor.quiet "false" && + + test_terminal git -C repo cat-file -p $(cat blobhash) 2>err && + + # Ensure that progress messages are written + grep "Receiving objects" err +' + +test_expect_success TTY 'promisor.quiet=true works' ' + rm -rf server server2 repo && + rm -rf server server3 repo && + test_create_repo server && + test_commit -C server foo && + git -C server repack -a -d --write-bitmap-index && + + git clone "file://$(pwd)/server" repo && + git hash-object repo/foo.t >blobhash && + rm -rf repo/.git/objects/* && + + git -C server config uploadpack.allowanysha1inwant 1 && + git -C server config uploadpack.allowfilter 1 && + git -C repo config core.repositoryformatversion 1 && + git -C repo config extensions.partialclone "origin" && + git -C repo config promisor.quiet "true" && + + test_terminal git -C repo cat-file -p $(cat blobhash) 2>err && + + # Ensure that no progress messages are written + ! grep "Receiving objects" err +' + . "$TEST_DIRECTORY"/lib-httpd.sh start_httpd
Add a configuration optione to allow output from the promisor fetching objects to be suppressed/ This allows us to stop commands like git blame being swamped with progress messages and gc notifications from the promisor when used in a partial clone. Signed-off-by: Tom Hughes <tom@compton.nu> --- Documentation/config.txt | 2 ++ Documentation/config/promisor.txt | 3 ++ promisor-remote.c | 3 ++ t/t0410-partial-clone.sh | 47 +++++++++++++++++++++++++++++++ 4 files changed, 55 insertions(+) create mode 100644 Documentation/config/promisor.txt