Message ID | 2cbf0b611133df5fa7eed1bf38460f9d119d2a6e.1630359290.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Upstreaming the Scalar command | expand |
On Mon, Aug 30 2021, Johannes Schindelin via GitGitGadget wrote: > This implements Scalar's opinionated `clone` command: it tries to use a > partial clone and sets up a sparse checkout by default. In contrast to > `git clone`, `scalar clone` sets up the worktree in the `src/` > subdirectory, to encourage a separation between the source files and the > build output (which helps Git tremendously because it avoids untracked > files that have to be specifically ignored when refreshing the index). Perhaps nobody else wondered this while reading this, but I thought this might be some sparse/worktree magic where cloning into "foo" would have "foo/.git", but the worktree was somehow magically mapped at foo/src/". But no, it just takes your "scalar clone <url> foo" and translates it to "foo/src", so you'll get a directory at "foo". > Note: We intentionally use a slightly wasteful `set_config()` function > (which does not reuse a single `strbuf`, for example, though performance > _really_ does not matter here) for convenience and readability. FWIW I think the commit message could do without this, that part of the code is obviously not performance sensitive at all. But maybe an explicit note helps anyway...
On Tue, Aug 31, 2021 at 8:04 AM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > On Mon, Aug 30 2021, Johannes Schindelin via GitGitGadget wrote: > > Note: We intentionally use a slightly wasteful `set_config()` function > > (which does not reuse a single `strbuf`, for example, though performance > > _really_ does not matter here) for convenience and readability. > > FWIW I think the commit message could do without this, that part of the > code is obviously not performance sensitive at all. But maybe an > explicit note helps anyway... FWIW, I also found this distracting; it takes the reader's attention away from more important aspects of the patch. (But it alone is not worth a re-roll; it was just a minor hiccup.)
"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com> writes: > +static char *remote_default_branch(const char *url) > +{ > + struct child_process cp = CHILD_PROCESS_INIT; > + struct strbuf out = STRBUF_INIT; > + > + cp.git_cmd = 1; > + strvec_pushl(&cp.args, "ls-remote", "--symref", url, "HEAD", NULL); > + strbuf_addstr(&out, "-\n"); Is this a workaround for the problem that the first "ref:" line won't be found by looking for "\nref: " in the loop? Cute, but the extra "-" is a bit misleading. > + if (!pipe_command(&cp, NULL, 0, &out, 0, NULL, 0)) { > + char *ref = out.buf; > + > + while ((ref = strstr(ref + 1, "\nref: "))) { > + const char *p; > + char *head, *branch; > + > + ref += strlen("\nref: "); > + head = strstr(ref, "\tHEAD"); > + > + if (!head || memchr(ref, '\n', head - ref)) > + continue; OK. We expect "ref: " <refname> "\t" <head> "\n" where <head> is "HEAD" for their .git/HEAD and refs/remotes/<nick>/HEAD for their remote-tracking branch for the remote they call <nick>, on a single line. We reject a line that is not of that shape, and we reject a line that is about remote-tracking branch by only looking for "\tHEAD". Makes sense. The strstr() goes from "ref + 1", which feels sloppy. When we reject the line we found that begins with "ref :", I would have expected that the next scan would start at the beginning of the next line, not from the middle of this line at the first letter 'e' in 'refs/heads/' on the current line "ref: refs/heads/.....". As long as the current line is long enough, strstr() would not miss the beginning of the next line, so it might be OK. > + if (skip_prefix(ref, "refs/heads/", &p)) { > + branch = xstrndup(p, head - p); > + strbuf_release(&out); > + return branch; > + } > + > + error(_("remote HEAD is not a branch: '%.*s'"), > + (int)(head - ref), ref); > + strbuf_release(&out); > + return NULL; OK. Any symref whose basename is HEAD in their remote-tracking hierarchy would have been rejected earlier in the loop. Is there a particular reason why we return early here, instead of breaking out of hte loop and let the generic "failed to get" code path below to handle this case? > + } > + } > + warning(_("failed to get default branch name from remote; " > + "using local default")); > + strbuf_reset(&out); > + > + child_process_init(&cp); > + cp.git_cmd = 1; > + strvec_pushl(&cp.args, "symbolic-ref", "--short", "HEAD", NULL); > + if (!pipe_command(&cp, NULL, 0, &out, 0, NULL, 0)) { > + strbuf_trim(&out); > + return strbuf_detach(&out, NULL); > + } > + > + strbuf_release(&out); > + error(_("failed to get default branch name")); > + return NULL; > +} > +static int cmd_clone(int argc, const char **argv) > +{ > + const char *branch = NULL; > + int full_clone = 0; > + struct option clone_options[] = { > + OPT_STRING('b', "branch", &branch, N_("<branch>"), > + N_("branch to checkout after clone")), > + OPT_BOOL(0, "full-clone", &full_clone, > + N_("when cloning, create full working directory")), > + OPT_END(), > + }; > + const char * const clone_usage[] = { > + N_("scalar clone [<options>] [--] <repo> [<dir>]"), > + NULL > + }; > + const char *url; > + char *enlistment = NULL, *dir = NULL; > + struct strbuf buf = STRBUF_INIT; > + int res; > + > + argc = parse_options(argc, argv, NULL, clone_options, clone_usage, 0); > + > + if (argc == 2) { > + url = argv[0]; > + enlistment = xstrdup(argv[1]); > + } else if (argc == 1) { > + url = argv[0]; > + > + strbuf_addstr(&buf, url); > + /* Strip trailing slashes, if any */ > + while (buf.len > 0 && is_dir_sep(buf.buf[buf.len - 1])) > + strbuf_setlen(&buf, buf.len - 1); > + /* Strip suffix `.git`, if any */ > + strbuf_strip_suffix(&buf, ".git"); > + > + enlistment = find_last_dir_sep(buf.buf); > + if (!enlistment) { > + die(_("cannot deduce worktree name from '%s'"), url); > + } > + enlistment = xstrdup(enlistment + 1); > + } else { > + usage_msg_opt(_("You must specify a repository to clone."), > + clone_usage, clone_options); > + } > + > + if (is_directory(enlistment)) > + die(_("directory '%s' exists already"), enlistment); > + > + dir = xstrfmt("%s/src", enlistment); > + > + strbuf_reset(&buf); > + if (branch) > + strbuf_addf(&buf, "init.defaultBranch=%s", branch); > + else { > + char *b = repo_default_branch_name(the_repository, 1); > + strbuf_addf(&buf, "init.defaultBranch=%s", b); > + free(b); Doesn't "git clone" already use their HEAD without having to make an extra "git ls-remote" roundtrip? Ahh, you do not do "git clone"; you do "git init", set things up, and then "git fetch" and checkout, all manually. Which is kind of shame. I wonder if it is a cleaner implementation to give a new option to "git clone" that gives a command sequence (not necessarily have to be implemented as a shell script) that specifies necessary pre-configuration steps to be done before the command starts the transfer step.
On 9/1/2021 12:45 PM, Junio C Hamano wrote: > "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com> > writes: > ... >> + dir = xstrfmt("%s/src", enlistment); >> + >> + strbuf_reset(&buf); >> + if (branch) >> + strbuf_addf(&buf, "init.defaultBranch=%s", branch); >> + else { >> + char *b = repo_default_branch_name(the_repository, 1); >> + strbuf_addf(&buf, "init.defaultBranch=%s", b); >> + free(b); > > Doesn't "git clone" already use their HEAD without having to make an > extra "git ls-remote" roundtrip? > > Ahh, you do not do "git clone"; you do "git init", set things up, > and then "git fetch" and checkout, all manually. > > Which is kind of shame. > > I wonder if it is a cleaner implementation to give a new option to > "git clone" that gives a command sequence (not necessarily have to > be implemented as a shell script) that specifies necessary > pre-configuration steps to be done before the command starts the > transfer step. I agree that 'git clone' plus maybe some more improvements like '--sparse=cone' to set up cone-mode sparse-checkout would be good. And also the implementation being contributed here is cleaner if we can use 'git clone'. We are trying to balance a clean upstream implementation with some custom things that we still need in our microsoft/git fork to handle the integration with the GVFS Protocol (i.e. partial clone on Azure Repos). That customization is cleaner to keep here in the scalar code instead of adding an option to 'git clone'. It is difficult to justify code patterns here due to choices we have made in our fork, so I _could_ see a way to replace those custom bits with new, custom flags to 'git clone'. It just requires additional investment during our integration when we incorporate these upstream changes. Naturally, I'm motivated to avoid that extra work. If your opinion to switch to 'git clone' is a strong one, then I could see us doing that change. I just want you to be aware of the hidden reasons for choices like these. Thanks, -Stolee
Hi Junio, On Wed, 1 Sep 2021, Junio C Hamano wrote: > "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com> > writes: > > > +static char *remote_default_branch(const char *url) > > +{ > > + struct child_process cp = CHILD_PROCESS_INIT; > > + struct strbuf out = STRBUF_INIT; > > + > > + cp.git_cmd = 1; > > + strvec_pushl(&cp.args, "ls-remote", "--symref", url, "HEAD", NULL); > > + strbuf_addstr(&out, "-\n"); > > Is this a workaround for the problem that the first "ref:" line > won't be found by looking for "\nref: " in the loop? Cute, but the > extra "-" is a bit misleading. The `-` is actually needed because of the `ref + 1` below, over which you stumbled. > > > + if (!pipe_command(&cp, NULL, 0, &out, 0, NULL, 0)) { > > + char *ref = out.buf; > > + > > + while ((ref = strstr(ref + 1, "\nref: "))) { > > + const char *p; > > + char *head, *branch; > > + > > + ref += strlen("\nref: "); > > + head = strstr(ref, "\tHEAD"); > > + > > + if (!head || memchr(ref, '\n', head - ref)) > > + continue; > > OK. We expect "ref: " <refname> "\t" <head> "\n" where <head> is > "HEAD" for their .git/HEAD and refs/remotes/<nick>/HEAD for their > remote-tracking branch for the remote they call <nick>, on a single > line. We reject a line that is not of that shape, and we reject a > line that is about remote-tracking branch by only looking for > "\tHEAD". Makes sense. > > The strstr() goes from "ref + 1", which feels sloppy. I would use a different adjective, one that is less judgemental in nature, but then, you were talking about your feelings. > When we reject the line we found that begins with "ref :", I would have > expected that the next scan would start at the beginning of the next > line, not from the middle of this line at the first letter 'e' in > 'refs/heads/' on the current line "ref: refs/heads/.....". As long as > the current line is long enough, strstr() would not miss the beginning > of the next line, so it might be OK. It would even work if the current line is shorter, but as you point out: it is wasteful. And it could be improved to be more readable. I reworked it, and it now looks like this: if (!pipe_command(&cp, NULL, 0, &out, 0, NULL, 0)) { const char *line = out.buf; while (*line) { const char *eol = strchrnul(line, '\n'), *p; size_t len = eol - line; char *branch; if (!skip_prefix(line, "ref: ", &p) || !strip_suffix_mem(line, &len, "\tHEAD")) { line = eol + (*eol == '\n'); continue; } eol = line + len; if (skip_prefix(p, "refs/heads/", &p)) { branch = xstrndup(p, eol - p); strbuf_release(&out); return branch; } error(_("remote HEAD is not a branch: '%.*s'"), (int)(eol - p), p); strbuf_release(&out); return NULL; } } It now parses the output line by line, looking for the expected prefix and suffix, then verifies the ref name format, and either returns the short branch name or errors out with the message that this is not a branch. > > > + if (skip_prefix(ref, "refs/heads/", &p)) { > > + branch = xstrndup(p, head - p); > > + strbuf_release(&out); > > + return branch; > > + } > > + > > + error(_("remote HEAD is not a branch: '%.*s'"), > > + (int)(head - ref), ref); > > + strbuf_release(&out); > > + return NULL; > > OK. Any symref whose basename is HEAD in their remote-tracking > hierarchy would have been rejected earlier in the loop. > > Is there a particular reason why we return early here, instead of > breaking out of hte loop and let the generic "failed to get" code > path below to handle this case? Yes, the reason is that I wanted to err on the side of caution. If the remote repository reports a default branch that is not a default branch at all, I do not want to pretend that things are fine and then run into trouble later when we set up a non-branch as remote-tracking target or something like that. > > > + } > > + } > > + warning(_("failed to get default branch name from remote; " > > + "using local default")); > > + strbuf_reset(&out); > > + > > + child_process_init(&cp); > > + cp.git_cmd = 1; > > + strvec_pushl(&cp.args, "symbolic-ref", "--short", "HEAD", NULL); > > + if (!pipe_command(&cp, NULL, 0, &out, 0, NULL, 0)) { > > + strbuf_trim(&out); > > + return strbuf_detach(&out, NULL); > > + } > > + > > + strbuf_release(&out); > > + error(_("failed to get default branch name")); > > + return NULL; > > +} > > > +static int cmd_clone(int argc, const char **argv) > > +{ > > + const char *branch = NULL; > > + int full_clone = 0; > > + struct option clone_options[] = { > > + OPT_STRING('b', "branch", &branch, N_("<branch>"), > > + N_("branch to checkout after clone")), > > + OPT_BOOL(0, "full-clone", &full_clone, > > + N_("when cloning, create full working directory")), > > + OPT_END(), > > + }; > > + const char * const clone_usage[] = { > > + N_("scalar clone [<options>] [--] <repo> [<dir>]"), > > + NULL > > + }; > > + const char *url; > > + char *enlistment = NULL, *dir = NULL; > > + struct strbuf buf = STRBUF_INIT; > > + int res; > > + > > + argc = parse_options(argc, argv, NULL, clone_options, clone_usage, 0); > > + > > + if (argc == 2) { > > + url = argv[0]; > > + enlistment = xstrdup(argv[1]); > > + } else if (argc == 1) { > > + url = argv[0]; > > + > > + strbuf_addstr(&buf, url); > > + /* Strip trailing slashes, if any */ > > + while (buf.len > 0 && is_dir_sep(buf.buf[buf.len - 1])) > > + strbuf_setlen(&buf, buf.len - 1); > > + /* Strip suffix `.git`, if any */ > > + strbuf_strip_suffix(&buf, ".git"); > > + > > + enlistment = find_last_dir_sep(buf.buf); > > + if (!enlistment) { > > + die(_("cannot deduce worktree name from '%s'"), url); > > + } > > + enlistment = xstrdup(enlistment + 1); > > + } else { > > + usage_msg_opt(_("You must specify a repository to clone."), > > + clone_usage, clone_options); > > + } > > + > > + if (is_directory(enlistment)) > > + die(_("directory '%s' exists already"), enlistment); > > + > > + dir = xstrfmt("%s/src", enlistment); > > + > > + strbuf_reset(&buf); > > + if (branch) > > + strbuf_addf(&buf, "init.defaultBranch=%s", branch); > > + else { > > + char *b = repo_default_branch_name(the_repository, 1); > > + strbuf_addf(&buf, "init.defaultBranch=%s", b); > > + free(b); > > Doesn't "git clone" already use their HEAD without having to make an > extra "git ls-remote" roundtrip? > > Ahh, you do not do "git clone"; you do "git init", set things up, > and then "git fetch" and checkout, all manually. > > Which is kind of shame. > > I wonder if it is a cleaner implementation to give a new option to > "git clone" that gives a command sequence (not necessarily have to > be implemented as a shell script) that specifies necessary > pre-configuration steps to be done before the command starts the > transfer step. Right. It is a shame, I agree. And it is one of the things I want to work on, after the Scalar patch series made it into Git. The reason why I don't want to work on this now is that I expect this effort to result in new options for `git clone`, new options that need to be designed well, and where I fully expect a long discussion until we reach a consensus how these options should look like, especially since we will need to maintain backwards-compatibility of Scalar's functionality. Therefore, in the interest to keep the patch series relatively easy to review, I left this in the "for later" pile, for now. Ciao, Dscho
Hi Eric, On Tue, 31 Aug 2021, Eric Sunshine wrote: > On Tue, Aug 31, 2021 at 8:04 AM Ævar Arnfjörð Bjarmason > <avarab@gmail.com> wrote: > > On Mon, Aug 30 2021, Johannes Schindelin via GitGitGadget wrote: > > > Note: We intentionally use a slightly wasteful `set_config()` function > > > (which does not reuse a single `strbuf`, for example, though performance > > > _really_ does not matter here) for convenience and readability. > > > > FWIW I think the commit message could do without this, that part of the > > code is obviously not performance sensitive at all. But maybe an > > explicit note helps anyway... > > FWIW, I also found this distracting; it takes the reader's attention > away from more important aspects of the patch. (But it alone is not > worth a re-roll; it was just a minor hiccup.) Since I reworked the remote default branch parsing anyway, I removed this paragraph from the commit message. Ciao, Dscho
Derrick Stolee <stolee@gmail.com> writes: >> Ahh, you do not do "git clone"; you do "git init", set things up, >> and then "git fetch" and checkout, all manually. >> >> Which is kind of shame. >> >> I wonder if it is a cleaner implementation to give a new option to >> "git clone" that gives a command sequence (not necessarily have to >> be implemented as a shell script) that specifies necessary >> pre-configuration steps to be done before the command starts the >> transfer step. > > I agree that 'git clone' plus maybe some more improvements like > '--sparse=cone' to set up cone-mode sparse-checkout would be good. > And also the implementation being contributed here is cleaner if > we can use 'git clone'. > > We are trying to balance a clean upstream implementation with some > custom things that we still need in our microsoft/git fork to > handle the integration with the GVFS Protocol (i.e. partial clone > on Azure Repos). That customization is cleaner to keep here in the > scalar code instead of adding an option to 'git clone'. Oh, there is no disagreement on that point, at least in the short term. I was wondering why "clone" subcommand needs a duplicated logic that should be unnecessary, before realizing that this was not implemented as a wrapper to (possibly updated) "clone", and I agree that starting with a looser coupling like this step does is easier to everybody. "Kind of shame" is just that I wished we had already prepared "git clone" side to accept customization more easily before its various distinct phases (new repository creation, where a custom logic may want to affect the name and location of it and how "git init" is driven, initial "fetch", where a custom logic may want to affect the fetch refspec and its parameters like depths and cones, and initial "checkout") do their things. If we allowed such plug-in of logic to affect how "git clone" worked already, it would have been possible to do "scalar clone" with much less code. It also would allow us to reorganize the "clone --local" hack in a way that is easier to reason about (I think even in today's code, the way I hooked it up can be seen which is quite messy). It may even help folks who want to extend "git clone" to clone a repository recursively its submodules with project-specific customizations (like which ones to clone by default, etc.). I suspect that learning from the way "scalar clone" is done on top of "init" + "fetch" + "checkout" in this initial series may help us extend "git clone" later to fill such needs. > If your opinion to switch to 'git clone' is a strong one, then I > could see us doing that change. I just want you to be aware of the > hidden reasons for choices like these. Not at all at this moment. It is mostly that the way "init" + "fetch" + "checkout" was done in this step reminded me of a much longer-term wish I have had for a while.
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > It would even work if the current line is shorter, but as you point out: > it is wasteful. And it could be improved to be more readable. I reworked > it, and it now looks like this: > > if (!pipe_command(&cp, NULL, 0, &out, 0, NULL, 0)) { > const char *line = out.buf; > > while (*line) { > const char *eol = strchrnul(line, '\n'), *p; > size_t len = eol - line; > char *branch; > > if (!skip_prefix(line, "ref: ", &p) || > !strip_suffix_mem(line, &len, "\tHEAD")) { > line = eol + (*eol == '\n'); > continue; > } > > eol = line + len; > if (skip_prefix(p, "refs/heads/", &p)) { > branch = xstrndup(p, eol - p); > strbuf_release(&out); > return branch; > } > > error(_("remote HEAD is not a branch: '%.*s'"), > (int)(eol - p), p); > strbuf_release(&out); > return NULL; > } > } > > It now parses the output line by line, looking for the expected prefix and > suffix, then verifies the ref name format, and either returns the short > branch name or errors out with the message that this is not a branch. It is much easier to read and understand how the loop works with above. >> > + error(_("remote HEAD is not a branch: '%.*s'"), >> > + (int)(head - ref), ref); >> > + strbuf_release(&out); >> > + return NULL; >> >> OK. Any symref whose basename is HEAD in their remote-tracking >> hierarchy would have been rejected earlier in the loop. >> >> Is there a particular reason why we return early here, instead of >> breaking out of hte loop and let the generic "failed to get" code >> path below to handle this case? > > Yes, the reason is that I wanted to err on the side of caution. If the > remote repository reports a default branch that is not a default branch at > all, I do not want to pretend that things are fine and then run into > trouble later when we set up a non-branch as remote-tracking target or > something like that. Wouldn't we have the same problem when the remote end does not advertise HEAD and we fall back to "local default", though? We'd run into trouble later as we use "local default" that may correspond to a non-branch there as remote-tracking target, or something like that. Not that I care too deeply in the error case, though. I just felt that this early return was an uneven way to follow the principle to err on the side of caution, as we continue with the local default when the other side fails to tell us what their HEAD points at. Thanks.
Hi Junio, On Fri, 3 Sep 2021, Junio C Hamano wrote: > Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > > > It would even work if the current line is shorter, but as you point out: > > it is wasteful. And it could be improved to be more readable. I reworked > > it, and it now looks like this: > > > > if (!pipe_command(&cp, NULL, 0, &out, 0, NULL, 0)) { > > const char *line = out.buf; > > > > while (*line) { > > const char *eol = strchrnul(line, '\n'), *p; > > size_t len = eol - line; > > char *branch; > > > > if (!skip_prefix(line, "ref: ", &p) || > > !strip_suffix_mem(line, &len, "\tHEAD")) { > > line = eol + (*eol == '\n'); > > continue; > > } > > > > eol = line + len; > > if (skip_prefix(p, "refs/heads/", &p)) { > > branch = xstrndup(p, eol - p); > > strbuf_release(&out); > > return branch; > > } > > > > error(_("remote HEAD is not a branch: '%.*s'"), > > (int)(eol - p), p); > > strbuf_release(&out); > > return NULL; > > } > > } > > > > It now parses the output line by line, looking for the expected prefix and > > suffix, then verifies the ref name format, and either returns the short > > branch name or errors out with the message that this is not a branch. > > It is much easier to read and understand how the loop works with > above. Excellent. > >> > + error(_("remote HEAD is not a branch: '%.*s'"), > >> > + (int)(head - ref), ref); > >> > + strbuf_release(&out); > >> > + return NULL; > >> > >> OK. Any symref whose basename is HEAD in their remote-tracking > >> hierarchy would have been rejected earlier in the loop. > >> > >> Is there a particular reason why we return early here, instead of > >> breaking out of hte loop and let the generic "failed to get" code > >> path below to handle this case? > > > > Yes, the reason is that I wanted to err on the side of caution. If the > > remote repository reports a default branch that is not a default branch at > > all, I do not want to pretend that things are fine and then run into > > trouble later when we set up a non-branch as remote-tracking target or > > something like that. > > Wouldn't we have the same problem when the remote end does not > advertise HEAD and we fall back to "local default", though? We'd > run into trouble later as we use "local default" that may correspond > to a non-branch there as remote-tracking target, or something like > that. All true, there will be trouble at some stage. The difference between the two cases, in my mind, is that cloning an empty repository would run into the latter code path and might still have a chance to work as intended by using the local default branch name. In any case, as I indicated earlier, I am _very_ interested in moving as much functionality as possible from Scalar to Git proper. In this instance, I hope to move most of the code from `scalar.c` to `clone.c` (guarded by one or more new options). And as soon as that happens, the discussion we're having right now will be moot ;-) Which means that I want to weigh how much effort to put into polishing an unlikely code path on one side, and on the other side how much effort to put into moving the functionality away from `contrib/` and deleting that unlikely code path. In the same vein, while this patch series contains (mostly) code in `contrib/` (and therefore technically does not need to adhere strictly to Git's code style), it is probably wise to pay closer attention to the code style particularly in those parts that are prone to be moved verbatim (or close to verbatim) to Git proper. Thanks, Dscho > Not that I care too deeply in the error case, though. I just felt > that this early return was an uneven way to follow the principle to > err on the side of caution, as we continue with the local default > when the other side fails to tell us what their HEAD points at. > > Thanks. >
On Wed, Sep 08 2021, Johannes Schindelin wrote: > [...] > Which means that I want to weigh how much effort to put into polishing an > unlikely code path on one side, and on the other side how much effort to > put into moving the functionality away from `contrib/` and deleting that > unlikely code path. > > In the same vein, while this patch series contains (mostly) code in > `contrib/` (and therefore technically does not need to adhere strictly to > Git's code style), it is probably wise to pay closer attention to the code > style particularly in those parts that are prone to be moved verbatim (or > close to verbatim) to Git proper. I don't think we have any such exception to our usual style & preferred code patterns in contrib/* or compat/* in general. In the latter case we have e.g. compat/regex/ and other externally-imported codebases, which we've tried to stylistically modify as little as possible to make subsequent imports easier, ditto sha1dc/ etc. I think the general (but unwritten) rule has been to draw the distinction on whether or not code is still externally maintained and expected to be imported, or if it's expected to be maintained in git.git going forward. I think that this proposed series falls thoroughly in the latter category, but maybe I've misunderstood it.. Also re my [1] I had some (still relevant, but unaddressed) points on v1 about how placing this in contrib/* made certain aspects of integrating it into our build system harder. I was imagining that distinction as purely an internal implementation detail to git.git (make install etc. would behave the same), but per the above it seems to come with deeper connotations than that at least in your mind. 1. https://lore.kernel.org/git/87r1dydp4m.fsf@evledraar.gmail.com
On Mon, Aug 30, 2021 at 2:36 PM Johannes Schindelin via GitGitGadget <gitgitgadget@gmail.com> wrote: > ... > COMMANDS > -------- > > +Clone > +~~~~~ > + > +clone [<options>] <url> [<enlistment>]:: > + Clones the specified repository, similar to linkgit:git-clone[1]. By > + default, only commit and tree objects are cloned. Once finished, the > + worktree is located at `<enlistment>/src`. > ++ > +The sparse-checkout feature is enabled (except when run with `--full-clone`) > +and the only files present are those in the top-level directory. Use > +`git sparse-checkout set` to expand the set of directories you want to see, > +or `git sparse-checkout disable` to expand to all files (see > +linkgit:git-sparse-checkout[1] for more details). You can explore the > +subdirectories outside your sparse-checkout by using `git ls-tree HEAD`. Should this be `git ls-tree [-r] HEAD`? Do you expect people to just add directories that are found immediately under the toplevel, rather than some that are a bit deeper?
Hi Elijah, On Mon, 27 Sep 2021, Elijah Newren wrote: > On Mon, Aug 30, 2021 at 2:36 PM Johannes Schindelin via GitGitGadget > <gitgitgadget@gmail.com> wrote: > > > ... > > COMMANDS > > -------- > > > > +Clone > > +~~~~~ > > + > > +clone [<options>] <url> [<enlistment>]:: > > + Clones the specified repository, similar to linkgit:git-clone[1]. By > > + default, only commit and tree objects are cloned. Once finished, the > > + worktree is located at `<enlistment>/src`. > > ++ > > +The sparse-checkout feature is enabled (except when run with `--full-clone`) > > +and the only files present are those in the top-level directory. Use > > +`git sparse-checkout set` to expand the set of directories you want to see, > > +or `git sparse-checkout disable` to expand to all files (see > > +linkgit:git-sparse-checkout[1] for more details). You can explore the > > +subdirectories outside your sparse-checkout by using `git ls-tree HEAD`. > > Should this be `git ls-tree [-r] HEAD`? Do you expect people to just > add directories that are found immediately under the toplevel, rather > than some that are a bit deeper? I fear that `git ls-tree -r HEAD` in any monorepo might be a bit too overwhelming for any reader. But I agree that just looking at HEAD is probably not enough. Maybe we should use `git ls-tree HEAD[:<dir>]`? Ciao, Dscho
On Wed, Oct 6, 2021 at 1:40 PM Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote: > > Hi Elijah, > > On Mon, 27 Sep 2021, Elijah Newren wrote: > > > On Mon, Aug 30, 2021 at 2:36 PM Johannes Schindelin via GitGitGadget > > <gitgitgadget@gmail.com> wrote: > > > > > ... > > > COMMANDS > > > -------- > > > > > > +Clone > > > +~~~~~ > > > + > > > +clone [<options>] <url> [<enlistment>]:: > > > + Clones the specified repository, similar to linkgit:git-clone[1]. By > > > + default, only commit and tree objects are cloned. Once finished, the > > > + worktree is located at `<enlistment>/src`. > > > ++ > > > +The sparse-checkout feature is enabled (except when run with `--full-clone`) > > > +and the only files present are those in the top-level directory. Use > > > +`git sparse-checkout set` to expand the set of directories you want to see, > > > +or `git sparse-checkout disable` to expand to all files (see > > > +linkgit:git-sparse-checkout[1] for more details). You can explore the > > > +subdirectories outside your sparse-checkout by using `git ls-tree HEAD`. > > > > Should this be `git ls-tree [-r] HEAD`? Do you expect people to just > > add directories that are found immediately under the toplevel, rather > > than some that are a bit deeper? > > I fear that `git ls-tree -r HEAD` in any monorepo might be a bit too > overwhelming for any reader. Oh, right... > But I agree that just looking at HEAD is probably not enough. Maybe we > should use `git ls-tree HEAD[:<dir>]`? Sounds good.
diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c index 91ceb97e552..13cdfa94d16 100644 --- a/contrib/scalar/scalar.c +++ b/contrib/scalar/scalar.c @@ -7,6 +7,7 @@ #include "parse-options.h" #include "config.h" #include "run-command.h" +#include "refs.h" /* * Remove the deepest subdirectory in the provided path string. Path must not @@ -258,6 +259,204 @@ static int unregister_dir(void) return res; } +/* printf-style interface, expects `<key>=<value>` argument */ +static int set_config(const char *fmt, ...) +{ + struct strbuf buf = STRBUF_INIT; + char *value; + int res; + va_list args; + + va_start(args, fmt); + strbuf_vaddf(&buf, fmt, args); + va_end(args); + + value = strchr(buf.buf, '='); + if (value) + *(value++) = '\0'; + res = git_config_set_gently(buf.buf, value); + strbuf_release(&buf); + + return res; +} + +static char *remote_default_branch(const char *url) +{ + struct child_process cp = CHILD_PROCESS_INIT; + struct strbuf out = STRBUF_INIT; + + cp.git_cmd = 1; + strvec_pushl(&cp.args, "ls-remote", "--symref", url, "HEAD", NULL); + strbuf_addstr(&out, "-\n"); + if (!pipe_command(&cp, NULL, 0, &out, 0, NULL, 0)) { + char *ref = out.buf; + + while ((ref = strstr(ref + 1, "\nref: "))) { + const char *p; + char *head, *branch; + + ref += strlen("\nref: "); + head = strstr(ref, "\tHEAD"); + + if (!head || memchr(ref, '\n', head - ref)) + continue; + + if (skip_prefix(ref, "refs/heads/", &p)) { + branch = xstrndup(p, head - p); + strbuf_release(&out); + return branch; + } + + error(_("remote HEAD is not a branch: '%.*s'"), + (int)(head - ref), ref); + strbuf_release(&out); + return NULL; + } + } + warning(_("failed to get default branch name from remote; " + "using local default")); + strbuf_reset(&out); + + child_process_init(&cp); + cp.git_cmd = 1; + strvec_pushl(&cp.args, "symbolic-ref", "--short", "HEAD", NULL); + if (!pipe_command(&cp, NULL, 0, &out, 0, NULL, 0)) { + strbuf_trim(&out); + return strbuf_detach(&out, NULL); + } + + strbuf_release(&out); + error(_("failed to get default branch name")); + return NULL; +} + +static int cmd_clone(int argc, const char **argv) +{ + const char *branch = NULL; + int full_clone = 0; + struct option clone_options[] = { + OPT_STRING('b', "branch", &branch, N_("<branch>"), + N_("branch to checkout after clone")), + OPT_BOOL(0, "full-clone", &full_clone, + N_("when cloning, create full working directory")), + OPT_END(), + }; + const char * const clone_usage[] = { + N_("scalar clone [<options>] [--] <repo> [<dir>]"), + NULL + }; + const char *url; + char *enlistment = NULL, *dir = NULL; + struct strbuf buf = STRBUF_INIT; + int res; + + argc = parse_options(argc, argv, NULL, clone_options, clone_usage, 0); + + if (argc == 2) { + url = argv[0]; + enlistment = xstrdup(argv[1]); + } else if (argc == 1) { + url = argv[0]; + + strbuf_addstr(&buf, url); + /* Strip trailing slashes, if any */ + while (buf.len > 0 && is_dir_sep(buf.buf[buf.len - 1])) + strbuf_setlen(&buf, buf.len - 1); + /* Strip suffix `.git`, if any */ + strbuf_strip_suffix(&buf, ".git"); + + enlistment = find_last_dir_sep(buf.buf); + if (!enlistment) { + die(_("cannot deduce worktree name from '%s'"), url); + } + enlistment = xstrdup(enlistment + 1); + } else { + usage_msg_opt(_("You must specify a repository to clone."), + clone_usage, clone_options); + } + + if (is_directory(enlistment)) + die(_("directory '%s' exists already"), enlistment); + + dir = xstrfmt("%s/src", enlistment); + + strbuf_reset(&buf); + if (branch) + strbuf_addf(&buf, "init.defaultBranch=%s", branch); + else { + char *b = repo_default_branch_name(the_repository, 1); + strbuf_addf(&buf, "init.defaultBranch=%s", b); + free(b); + } + + if ((res = run_git("-c", buf.buf, "init", "--", dir, NULL))) + goto cleanup; + + if (chdir(dir) < 0) { + res = error_errno(_("could not switch to '%s'"), dir); + goto cleanup; + } + + setup_git_directory(); + + /* common-main already logs `argv` */ + trace2_def_repo(the_repository); + + if (!branch && !(branch = remote_default_branch(url))) { + res = error(_("failed to get default branch for '%s'"), url); + goto cleanup; + } + + if (set_config("remote.origin.url=%s", url) || + set_config("remote.origin.fetch=" + "+refs/heads/*:refs/remotes/origin/*") || + set_config("remote.origin.promisor=true") || + set_config("remote.origin.partialCloneFilter=blob:none")) { + res = error(_("could not configure remote in '%s'"), dir); + goto cleanup; + } + + if (!full_clone && + (res = run_git("sparse-checkout", "init", "--cone", NULL))) + goto cleanup; + + if (set_recommended_config()) + return error(_("could not configure '%s'"), dir); + + if ((res = run_git("fetch", "--quiet", "origin", NULL))) { + warning(_("partial clone failed; attempting full clone")); + + if (set_config("remote.origin.promisor") || + set_config("remote.origin.partialCloneFilter")) { + res = error(_("could not configure for full clone")); + goto cleanup; + } + + if ((res = run_git("fetch", "--quiet", "origin", NULL))) + goto cleanup; + } + + if ((res = set_config("branch.%s.remote=origin", branch))) + goto cleanup; + if ((res = set_config("branch.%s.merge=refs/heads/%s", + branch, branch))) + goto cleanup; + + strbuf_reset(&buf); + strbuf_addf(&buf, "origin/%s", branch); + res = run_git("checkout", "-f", "-t", buf.buf, NULL); + if (res) + goto cleanup; + + res = register_dir(); + +cleanup: + free(enlistment); + free(dir); + strbuf_release(&buf); + return res; +} + static int cmd_list(int argc, const char **argv) { if (argc != 1) @@ -354,6 +553,7 @@ static struct { const char *name; int (*fn)(int, const char **); } builtins[] = { + { "clone", cmd_clone }, { "list", cmd_list }, { "register", cmd_register }, { "unregister", cmd_unregister }, diff --git a/contrib/scalar/scalar.txt b/contrib/scalar/scalar.txt index e1f629fddad..90d59f1d79f 100644 --- a/contrib/scalar/scalar.txt +++ b/contrib/scalar/scalar.txt @@ -8,6 +8,7 @@ scalar - an opinionated repository management tool SYNOPSIS -------- [verse] +scalar clone [--branch <main-branch>] [--full-clone] <url> [<enlistment>] scalar list scalar register [<enlistment>] scalar unregister [<enlistment>] @@ -29,19 +30,43 @@ an existing Git worktree with Scalar whose name is not `src`, the enlistment will be identical to the worktree. The `scalar` command implements various subcommands, and different options -depending on the subcommand. With the exception of `list`, all subcommands -expect to be run in an enlistment. +depending on the subcommand. With the exception of `clone` and `list`, all +subcommands expect to be run in an enlistment. COMMANDS -------- +Clone +~~~~~ + +clone [<options>] <url> [<enlistment>]:: + Clones the specified repository, similar to linkgit:git-clone[1]. By + default, only commit and tree objects are cloned. Once finished, the + worktree is located at `<enlistment>/src`. ++ +The sparse-checkout feature is enabled (except when run with `--full-clone`) +and the only files present are those in the top-level directory. Use +`git sparse-checkout set` to expand the set of directories you want to see, +or `git sparse-checkout disable` to expand to all files (see +linkgit:git-sparse-checkout[1] for more details). You can explore the +subdirectories outside your sparse-checkout by using `git ls-tree HEAD`. + +-b <name>:: +--branch <name>:: + Instead of checking out the branch pointed to by the cloned repository's + HEAD, check out the `<name>` branch instead. + +--[no-]full-clone:: + A sparse-checkout is initialized by default. This behavior can be turned + off via `--full-clone`. + List ~~~~ list:: To see which repositories are currently registered by the service, run - `scalar list`. This subcommand does not need to be run inside a Scalar - enlistment. + `scalar list`. This subcommand, like `clone`, does not need to be run + inside a Scalar enlistment. Register ~~~~~~~~ @@ -61,7 +86,7 @@ unregister [<enlistment>]:: SEE ALSO -------- -linkgit:git-maintenance[1]. +linkgit:git-clone[1], linkgit:git-maintenance[1]. Scalar --- diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh index ef0e8d680d5..295398f62cc 100755 --- a/contrib/scalar/t/t9099-scalar.sh +++ b/contrib/scalar/t/t9099-scalar.sh @@ -10,6 +10,9 @@ PATH=$PWD/..:$PATH . ../../../t/test-lib.sh +GIT_TEST_MAINT_SCHEDULER="crontab:test-tool crontab ../cron.txt" +export GIT_TEST_MAINT_SCHEDULER + test_expect_success 'scalar shows a usage' ' test_expect_code 129 scalar -h ' @@ -29,4 +32,33 @@ test_expect_success 'scalar unregister' ' ! grep -F "$(pwd)/vanish/src" scalar.repos ' +test_expect_success 'set up repository to clone' ' + test_commit first && + test_commit second && + test_commit third && + git switch -c parallel first && + mkdir -p 1/2 && + test_commit 1/2/3 && + git config uploadPack.allowFilter true && + git config uploadPack.allowAnySHA1InWant true +' + +test_expect_success 'scalar clone' ' + second=$(git rev-parse --verify second:second.t) && + scalar clone "file://$(pwd)" cloned && + ( + cd cloned/src && + + git config --get --global --fixed-value maintenance.repo \ + "$(pwd)" && + + test_path_is_missing 1/2 && + test_must_fail git rev-list --missing=print $second && + git rev-list $second && + git cat-file blob $second >actual && + echo "second" >expect && + test_cmp expect actual + ) +' + test_done