Message ID | 20180921223558.65055-2-sbeller@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | fetch: make sure submodule oids are fetched | expand |
On Fri, Sep 21 2018, Stefan Beller wrote: > +/* > + * Apply want to each entry in array, retaining only the entries for > + * which the function returns true. Preserve the order of the entries > + * that are retained. > + */ > +void oid_array_filter(struct oid_array *array, > + for_each_oid_fn want, > + void *cbdata); > + > #endif /* SHA1_ARRAY_H */ The code LGTM, but this comment should instead be an update to the API docs, see my recent 5cc044e025 ("get_short_oid: sort ambiguous objects by type, then SHA-1", 2018-05-10) for an addition of a new function to this API where I added some new docs.
On Sat, Sep 22, 2018 at 5:58 AM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > > > On Fri, Sep 21 2018, Stefan Beller wrote: > > > +/* > > + * Apply want to each entry in array, retaining only the entries for > > + * which the function returns true. Preserve the order of the entries > > + * that are retained. > > + */ > > +void oid_array_filter(struct oid_array *array, > > + for_each_oid_fn want, > > + void *cbdata); > > + > > #endif /* SHA1_ARRAY_H */ > > The code LGTM, but this comment should instead be an update to the API > docs, see my recent 5cc044e025 ("get_short_oid: sort ambiguous objects > by type, then SHA-1", 2018-05-10) for an addition of a new function to > this API where I added some new docs. ok will fix for consistency (this whole API is there). Longer term (I thought) we were trying to migrate API docs to headers instead? Stefan
On Tue, Sep 25, 2018 at 12:26:44PM -0700, Stefan Beller wrote: > On Sat, Sep 22, 2018 at 5:58 AM Ævar Arnfjörð Bjarmason > <avarab@gmail.com> wrote: > > > > > > On Fri, Sep 21 2018, Stefan Beller wrote: > > > > > +/* > > > + * Apply want to each entry in array, retaining only the entries for > > > + * which the function returns true. Preserve the order of the entries > > > + * that are retained. > > > + */ > > > +void oid_array_filter(struct oid_array *array, > > > + for_each_oid_fn want, > > > + void *cbdata); > > > + > > > #endif /* SHA1_ARRAY_H */ > > > > The code LGTM, but this comment should instead be an update to the API > > docs, see my recent 5cc044e025 ("get_short_oid: sort ambiguous objects > > by type, then SHA-1", 2018-05-10) for an addition of a new function to > > this API where I added some new docs. > > ok will fix for consistency (this whole API is there). > > Longer term (I thought) we were trying to migrate API docs > to headers instead? Yes, please. I think it prevents exactly this sort of confusion. :) -Peff
Jeff King <peff@peff.net> writes: > On Tue, Sep 25, 2018 at 12:26:44PM -0700, Stefan Beller wrote: > >> On Sat, Sep 22, 2018 at 5:58 AM Ævar Arnfjörð Bjarmason >> <avarab@gmail.com> wrote: >> > >> > >> > On Fri, Sep 21 2018, Stefan Beller wrote: >> > >> > > +/* >> > > + * Apply want to each entry in array, retaining only the entries for >> > > + * which the function returns true. Preserve the order of the entries >> > > + * that are retained. >> > > + */ >> > > +void oid_array_filter(struct oid_array *array, >> > > + for_each_oid_fn want, >> > > + void *cbdata); >> > > + >> > > #endif /* SHA1_ARRAY_H */ >> > >> > The code LGTM, but this comment should instead be an update to the API >> > docs, see my recent 5cc044e025 ("get_short_oid: sort ambiguous objects >> > by type, then SHA-1", 2018-05-10) for an addition of a new function to >> > this API where I added some new docs. >> >> ok will fix for consistency (this whole API is there). >> >> Longer term (I thought) we were trying to migrate API docs >> to headers instead? > > Yes, please. I think it prevents exactly this sort of confusion. :) CodingGuidelines or SubmittingPatches update, perhaps? Documentation/CodingGuidelines | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines index 48aa4edfbd..b54684e807 100644 --- a/Documentation/CodingGuidelines +++ b/Documentation/CodingGuidelines @@ -358,7 +358,11 @@ For C programs: string_list for sorted string lists, a hash map (mapping struct objects) named "struct decorate", amongst other things. - - When you come up with an API, document it. + - When you come up with an API, document it. It used to be + encouraged to do so in Documentation/technical/, and the birds-eye + level overview may still be more suitable there, but detailed + function-by-function level of documentation is done by comments in + corresponding .h files these days. - The first #include in C files, except in platform specific compat/ implementations, must be either "git-compat-util.h", "cache.h" or
On Wed, Sep 26 2018, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > >> On Tue, Sep 25, 2018 at 12:26:44PM -0700, Stefan Beller wrote: >> >>> On Sat, Sep 22, 2018 at 5:58 AM Ævar Arnfjörð Bjarmason >>> <avarab@gmail.com> wrote: >>> > >>> > >>> > On Fri, Sep 21 2018, Stefan Beller wrote: >>> > >>> > > +/* >>> > > + * Apply want to each entry in array, retaining only the entries for >>> > > + * which the function returns true. Preserve the order of the entries >>> > > + * that are retained. >>> > > + */ >>> > > +void oid_array_filter(struct oid_array *array, >>> > > + for_each_oid_fn want, >>> > > + void *cbdata); >>> > > + >>> > > #endif /* SHA1_ARRAY_H */ >>> > >>> > The code LGTM, but this comment should instead be an update to the API >>> > docs, see my recent 5cc044e025 ("get_short_oid: sort ambiguous objects >>> > by type, then SHA-1", 2018-05-10) for an addition of a new function to >>> > this API where I added some new docs. >>> >>> ok will fix for consistency (this whole API is there). >>> >>> Longer term (I thought) we were trying to migrate API docs >>> to headers instead? >> >> Yes, please. I think it prevents exactly this sort of confusion. :) > > CodingGuidelines or SubmittingPatches update, perhaps? > > Documentation/CodingGuidelines | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines > index 48aa4edfbd..b54684e807 100644 > --- a/Documentation/CodingGuidelines > +++ b/Documentation/CodingGuidelines > @@ -358,7 +358,11 @@ For C programs: > string_list for sorted string lists, a hash map (mapping struct > objects) named "struct decorate", amongst other things. > > - - When you come up with an API, document it. > + - When you come up with an API, document it. It used to be > + encouraged to do so in Documentation/technical/, and the birds-eye > + level overview may still be more suitable there, but detailed > + function-by-function level of documentation is done by comments in > + corresponding .h files these days. > > - The first #include in C files, except in platform specific compat/ > implementations, must be either "git-compat-util.h", "cache.h" or Thanks. I had not looked at this closely and was under the false impression that it was going in the other direction. Good to have it clarified.
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > On Wed, Sep 26 2018, Junio C Hamano wrote: > >> Jeff King <peff@peff.net> writes: >>> >>> Yes, please. I think it prevents exactly this sort of confusion. :) >> >> CodingGuidelines or SubmittingPatches update, perhaps? >> >> Documentation/CodingGuidelines | 6 +++++- >> 1 file changed, 5 insertions(+), 1 deletion(-) >> >> diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines >> index 48aa4edfbd..b54684e807 100644 >> --- a/Documentation/CodingGuidelines >> +++ b/Documentation/CodingGuidelines >> @@ -358,7 +358,11 @@ For C programs: >> string_list for sorted string lists, a hash map (mapping struct >> objects) named "struct decorate", amongst other things. >> >> - - When you come up with an API, document it. >> + - When you come up with an API, document it. It used to be >> + encouraged to do so in Documentation/technical/, and the birds-eye >> + level overview may still be more suitable there, but detailed >> + function-by-function level of documentation is done by comments in >> + corresponding .h files these days. >> >> - The first #include in C files, except in platform specific compat/ >> implementations, must be either "git-compat-util.h", "cache.h" or > > Thanks. I had not looked at this closely and was under the false > impression that it was going in the other direction. Good to have it > clarified. Heh, I knew people were in favor of one over the other but until Peff chimed in to this thread, I didn't recall which one was preferred, partly because I personally do not see a huge advantage in using in-code comments as docs for programmers, and do not like having to read them as in-code comments. If somebody wants to wordsmith the text and send in a patch with good log message, please do so, as I myself am not sure if what I wrote is the consensus position. It could be that they want to have even birds-eye overview in the header files.
On Wed, Sep 26, 2018 at 11:27:53AM -0700, Junio C Hamano wrote: > >> diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines > >> index 48aa4edfbd..b54684e807 100644 > >> --- a/Documentation/CodingGuidelines > >> +++ b/Documentation/CodingGuidelines > >> @@ -358,7 +358,11 @@ For C programs: > >> string_list for sorted string lists, a hash map (mapping struct > >> objects) named "struct decorate", amongst other things. > >> > >> - - When you come up with an API, document it. > >> + - When you come up with an API, document it. It used to be > >> + encouraged to do so in Documentation/technical/, and the birds-eye > >> + level overview may still be more suitable there, but detailed > >> + function-by-function level of documentation is done by comments in > >> + corresponding .h files these days. > >> > >> - The first #include in C files, except in platform specific compat/ > >> implementations, must be either "git-compat-util.h", "cache.h" or > > > > Thanks. I had not looked at this closely and was under the false > > impression that it was going in the other direction. Good to have it > > clarified. > > Heh, I knew people were in favor of one over the other but until > Peff chimed in to this thread, I didn't recall which one was > preferred, partly because I personally do not see a huge advantage > in using in-code comments as docs for programmers, and do not like > having to read them as in-code comments. > > If somebody wants to wordsmith the text and send in a patch with > good log message, please do so, as I myself am not sure if what I > wrote is the consensus position. It could be that they want to have > even birds-eye overview in the header files. Yes, I would say that everything should go into the header files. For the same reason that the function descriptions should go there: by being close to the thing being changed, it is more likely that authors will remember to adjust the documentation. It's not exactly literate programming, but it's a step in that direction. That's just my opinion, of course. I've been very happy so far with the documentation that we have transitioned. We talked a while ago about a script to extract the comments into a "just documentation" and build it as asciidoc, but I doubt I would use such a thing myself. I do agree in general that mentioning this in CodingGuidelines is a good idea. -Peff
On Wed, Sep 26 2018, Junio C Hamano wrote: > Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > >> On Wed, Sep 26 2018, Junio C Hamano wrote: >> >>> Jeff King <peff@peff.net> writes: >>>> >>>> Yes, please. I think it prevents exactly this sort of confusion. :) >>> >>> CodingGuidelines or SubmittingPatches update, perhaps? >>> >>> Documentation/CodingGuidelines | 6 +++++- >>> 1 file changed, 5 insertions(+), 1 deletion(-) >>> >>> diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines >>> index 48aa4edfbd..b54684e807 100644 >>> --- a/Documentation/CodingGuidelines >>> +++ b/Documentation/CodingGuidelines >>> @@ -358,7 +358,11 @@ For C programs: >>> string_list for sorted string lists, a hash map (mapping struct >>> objects) named "struct decorate", amongst other things. >>> >>> - - When you come up with an API, document it. >>> + - When you come up with an API, document it. It used to be >>> + encouraged to do so in Documentation/technical/, and the birds-eye >>> + level overview may still be more suitable there, but detailed >>> + function-by-function level of documentation is done by comments in >>> + corresponding .h files these days. >>> >>> - The first #include in C files, except in platform specific compat/ >>> implementations, must be either "git-compat-util.h", "cache.h" or >> >> Thanks. I had not looked at this closely and was under the false >> impression that it was going in the other direction. Good to have it >> clarified. > > Heh, I knew people were in favor of one over the other but until > Peff chimed in to this thread, I didn't recall which one was > preferred, partly because I personally do not see a huge advantage > in using in-code comments as docs for programmers, and do not like > having to read them as in-code comments. > > If somebody wants to wordsmith the text and send in a patch with > good log message, please do so, as I myself am not sure if what I > wrote is the consensus position. It could be that they want to have > even birds-eye overview in the header files. While we're on that subject, I'd much prefer if these technical docs and other asciidoc-ish stuff we would be manpages build as part of our normal "make doc" target. So e.g. this one would be readable via something like "man 3 gitapi-oid-array". They could still be maintained along with the code given a sufficiently smart doc build system, e.g. perl does that: https://github.com/Perl/perl5/blob/v5.26.0/av.c#L294-L387 Although doing that in some readable and sane way would mean getting rid of your beloved multi-line /* ... */ comment formatting, at least for that case. It would be a pain to have everyone configure their editor to word-wrap lines with leading "*" without screwing up the asciidoc format.
On Wed, Sep 26, 2018 at 08:43:14PM +0200, Ævar Arnfjörð Bjarmason wrote: > While we're on that subject, I'd much prefer if these technical docs and > other asciidoc-ish stuff we would be manpages build as part of our > normal "make doc" target. So e.g. this one would be readable via > something like "man 3 gitapi-oid-array". I'm mildly negative on this, just because it introduces extra work on people writing the documentation. Now it has to follow special formatting rules, and sometimes the source is uglier (e.g., the horrible +-continuation in lists). Are authors now responsible for formatting any changes they make to make sure they look good in asciidoc? Or are people who care about the formatted output going to come along afterwards and submit fixup patches? Either way it seems like make-work. Now I'll admit it seems like make-work to me because I would not plan to ever look at the formatted output myself. But I guess I don't understand the audience for this formatted output. These are APIs internal to Git itself. We would not generally want to install gitapi-oid-array into /usr/share/man, because only people actually working on Git would be able to use it. So it sounds like a convenience for a handful of developers (who like to look at this manpage versus the source). It doesn't seem like the cost/benefit is there. And if we were going to generate something external, would it make more sense to write in a structured format like doxygen? I am not a big fan of it myself, but at least from there you can generate a more richly interconnected set of documentation. -Peff
On Wed, Sep 26, 2018 at 11:58 AM Jeff King <peff@peff.net> wrote: > Now I'll admit it seems like make-work to me because I would not plan to > ever look at the formatted output myself. But I guess I don't understand > the audience for this formatted output. I have been watching from the side lines so far, as I have a view that is very much like Peffs, if I were to dictate my opinion onto the project, we'd: * put all internal API docs into headerfiles. * get rid of all Documentation/technical/ that describes things at a function level. So the remaining things are high level docs such as protocol, hash transition plan, partial clones... But I'd want to understand why we are not doing this (obvious to me) best thing, I have the impression other people use the docs differently. How are these docs (for example api-oid-array or api-revision-walking) used? I usually tend to use git-grep a lot when looking for a function, just to jump to the header file and read the comment there and go around the header file looking if we have a better suited thing. (Also I tend to like to have fewer files open, such that I prefer a header file of reasonable size over 2 files, one docs and one header) So when I find a function documented in Docs/technical, I would first read there, then go to the header to see the signature and then make use of it. If I recall an earlier discussion on this topic, I recall Junio saying that this *may* pollute the header files, as for example sha1-array.h is easy to grok in its entirety, and makes it easy to see what functions regarding oid_arrays are available. A counter example would be hashmap.h, as that requires some scrolling and skimming. > These are APIs internal to Git > itself. We would not generally want to install gitapi-oid-array into > /usr/share/man, because only people actually working on Git would be > able to use it. So it sounds like a convenience for a handful of > developers (who like to look at this manpage versus the source). It > doesn't seem like the cost/benefit is there. If this type of docs makes Ævar more productive, I am all for keeping it. > And if we were going to generate something external, would it make more > sense to write in a structured format like doxygen? I am not a big fan > of it myself, but at least from there you can generate a more richly > interconnected set of documentation. That sounds sensible to me as the additional burden of yet-another-tool is only imposed to the developers, but doxygen would provide a better output in my experience. Stefan
Jeff King <peff@peff.net> writes: > Now I'll admit it seems like make-work to me because I would not plan to > ever look at the formatted output myself. But I guess I don't understand > the audience for this formatted output. These are APIs internal to Git > itself. We would not generally want to install gitapi-oid-array into > /usr/share/man, because only people actually working on Git would be > able to use it. So it sounds like a convenience for a handful of > developers (who like to look at this manpage versus the source). It > doesn't seem like the cost/benefit is there. > > And if we were going to generate something external, would it make more > sense to write in a structured format like doxygen? I am not a big fan > of it myself, but at least from there you can generate a more richly > interconnected set of documentation. I agree on both counts. I just like to read these in plain text while I am coding for Git (or reviewing patches coded for Git). The reason why I have mild preference to D/technical/ over in-header doc is only because I find even these asterisks at the left-side-end distracting; it is not that materials in D/technical could be passed through AsciiDoc.
> I agree on both counts. I just like to read these in plain text > while I am coding for Git (or reviewing patches coded for Git). > > The reason why I have mild preference to D/technical/ over in-header > doc is only because I find even these asterisks at the left-side-end > distracting; it is not that materials in D/technical could be passed > through AsciiDoc. /** Would we be open to consider another commenting style? AFAICT the asterisks are a cultural thing and not enforced by and old compiler not understanding these comments. Just like ending the comment in an asymetric fashion ;-) */ (I am not seriously proposing unbalanced comments, but for longer documenting comments we may want to consider, omitting the asterisk?)
Stefan Beller <sbeller@google.com> writes: >> I agree on both counts. I just like to read these in plain text >> while I am coding for Git (or reviewing patches coded for Git). >> >> The reason why I have mild preference to D/technical/ over in-header >> doc is only because I find even these asterisks at the left-side-end >> distracting; it is not that materials in D/technical could be passed >> through AsciiDoc. > > /** > Would we be open to consider another commenting style? No. You still cannot write */ in that comment section, for example, so you cannot do "plain text" still---that is not a great trade off to deviate from the common comment style used for other stuff. When I say I want plain text, I mean it. Anything you write in .h cannot be plain text, unless you make all readers agree with some convention, and "there is always '*' at the left edge" is such an acceptable convention can learn to live with, just like everybody else does ; at least, that is consistent with other comments. > (I am not seriously proposing unbalanced comments, but for > longer documenting comments we may want to consider, > omitting the asterisk?) If this is not a serious proposal, then I'll stop wasting everybody's time. Now it is clear that the only pieces of consensus we got out of this discussion so far is that we want to add to CodingGuidelines, that my earlier "something like this" is not the text we want and that we want everything in the header as comment. So let's see if we can have usable alternative, apply that and move on.
[Changing subject because this stopped being about Stefan's patches a while ago] On Wed, Sep 26 2018, Jeff King wrote: > On Wed, Sep 26, 2018 at 08:43:14PM +0200, Ævar Arnfjörð Bjarmason wrote: > >> While we're on that subject, I'd much prefer if these technical docs and >> other asciidoc-ish stuff we would be manpages build as part of our >> normal "make doc" target. So e.g. this one would be readable via >> something like "man 3 gitapi-oid-array". [responding out of order] > Now I'll admit it seems like make-work to me because I would not plan to > ever look at the formatted output myself. But I guess I don't understand > the audience for this formatted output. These are APIs internal to Git > itself. We would not generally want to install gitapi-oid-array into > /usr/share/man, because only people actually working on Git would be > able to use it. So it sounds like a convenience for a handful of > developers (who like to look at this manpage versus the source). It > doesn't seem like the cost/benefit is there. > > And if we were going to generate something external, would it make more > sense to write in a structured format like doxygen? I am not a big fan > of it myself, but at least from there you can generate a more richly > interconnected set of documentation. It's useful to have a single authoritative source for all documentation that's easy to search through. I don't really care where that text actually lives, i.e. whether it's all in *.txt to begin with, or in *.c or *.h and pulled together by some "make" step, or much how it's formatted, beyond it being easy to work with. I'm not a big fan of asciidoc, but it's what we've got if we want formatting, inter-linking etc. My bias here is that I've also contributed actively to the perl project in the past, and with that project you can get an overview of *all* of the docs by typing: man perl That includes stuff like perl585delta(1) which we'd stick in Documentation/RelNotes, and "Internals and C Language Interface". Most of what we'd put in Documentation/technical/api-* & headers is in perlapi(1). I spent an embarrassingly long time submitting patches to git before discovering that Documentation/technical/api-*.txt even existed, I think something like 1-2 years ago. We have very well documented stuff like strbuf.h (mostly thanks to you), but how is such documentation supposed to be discovered? Something like: git grep -A20 '^/\*$' -- *.h <hold in page down> ? In terms of getting an overview it's indistinguishable from comments. I.e. there's nothing like an index of: man git-api-strbuf ==> working with strings man git-api-sha1-array ==> list, iterate and binary lookup SHA1s etc. Sometimes it's obvious where to look, but as someone who uses most of these APIs very occasionally (because I contribute less) I keep (re-)discovering various internal APIs. I'm also not in the habit of opening the *.h file for something, I usually just start reading the *.c and only later discover there's some API docs in the relevant *.h (or not). So in terms of implementation I'm a big fan of the perl.git approach of having these docs as comments before the function implementation in the *.c file. It means you can't avoid seeing it or updating it when source spelunking, and it leaves header files short, which is useful when you'd like to get a general API overview without all the docs. Our documented headers are quite fat, e.g. strbuf.h is 60% of the size of strbuf.c, but less than 20% when you strip the docs. > I'm mildly negative on this, just because it introduces extra work on > people writing the documentation. Now it has to follow special > formatting rules, and sometimes the source is uglier (e.g., the horrible > +-continuation in lists). Are authors now responsible for formatting any > changes they make to make sure they look good in asciidoc? Or are people > who care about the formatted output going to come along afterwards and > submit fixup patches? Either way it seems like make-work. This part I'm slightly confused by. If you're just saying "let's document stuff in *.h files in free-flowing text", fine. But if we're talking about the difference between Documentation/technical/api-*.txt and having the same stuff in manpages, then the stuff in api-*.txt *is* already asciidoc, and we have a target to format it all up and ship it with git, and distros ship that. E.g. on Debian you can "apt install git-doc" and browse stuff like file:///usr/share/doc/git-doc/technical/api-argv-array.html which is the HTML formatted version, same for all the other api-*.txt docs. We just don't have some central index of those, and they're not a default target which means the likes of our own https://git-scm.com don't build them, so they're harder to find. Every perl installation also ships perlapi and a MB or so of obscure internal docs to everyone, which makes it easier to find via Google et al, which I find useful. So I guess I'm more on the side fence of dropping a hypothetical gitapi-oid-array into /usr/share/man than you are. ANYWAY This E-mail is much longer than I intended, sorry about that. I didn't just mean to bitch about how we ship docs, but I was wondering if there was a desire to move to something like what I've outlined above, or whether the status quo was mostly by design and intended. I just thought I'd write this rather lengthy E-Mails because this is one of the rare areas where you can fully describe an idea in E-Mail without writing any patches. If the consensus is that something like the exhaustive index "perldoc perl" provides wouldn't be useful for git I can just drop this, but if people are enthusiastic about having that it would be useful to work on it...
On Wed, Sep 26, 2018 at 01:19:20PM -0700, Junio C Hamano wrote: > > /** > > Would we be open to consider another commenting style? > > No. You still cannot write */ in that comment section, for example, > so you cannot do "plain text" still---that is not a great trade off > to deviate from the common comment style used for other stuff. We can do: #if HEADER_DOC Write anything here except #endif! #endif /* HEADER_DOC */ But I actually think that is more jarring than a big comment (when I am reading C code, I expect to see C-like things mostly. I do agree with you that a true plain text file is nicer in terms of freedom to format, etc. But IMHO the tradeoff is easily worth it to keep the related content (function declarations and their documentation) close together. -Peff
> > (I am not seriously proposing unbalanced comments, but for > > longer documenting comments we may want to consider, > > omitting the asterisk?) > > If this is not a serious proposal, then I'll stop wasting > everybody's time. I proposed two things, one of which I was not serious about. The other has downsides as both you and Peff point out. So living with comments may not be the worst after all. I'll read&reply to Aevars proposal now.
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > So in terms of implementation I'm a big fan of the perl.git approach of > having these docs as comments before the function implementation in the > *.c file. > > It means you can't avoid seeing it or updating it when source > spelunking, and it leaves header files short, which is useful when you'd > like to get a general API overview without all the docs. Our documented > headers are quite fat, e.g. strbuf.h is 60% of the size of strbuf.c, but > less than 20% when you strip the docs. Just like you I do not care too much where the authoritative source lives, and I do agree with Peff that separate text file is much more likely to go stale wrt the code, and that it is an acceptable trade off to write docs in comment and live with slightly less freedom than writing in plain text in order to keep the docs and source closer together. I do not think between Peff and me, neither of us were contrasting in-header vs in-code comments. And I tend to agree that it makes it even harder to let doc and code drift apart to have the doc near the implementation (as opposed to in the header). It probably makes the result less useful, unless the project invests in making the result accessible like "man perlapi" does, than having them in the header for people who view headers as guide to users of API, though. > We just don't have some central index of those, and they're not a > default target which means the likes of our own https://git-scm.com > don't build them, so they're harder to find. > > Every perl installation also ships perlapi and a MB or so of obscure > internal docs to everyone, which makes it easier to find via Google et > al, which I find useful. So I guess I'm more on the side fence of > dropping a hypothetical gitapi-oid-array into /usr/share/man than you > are. Regardless of where we place docs (between .c/.h or .txt), making them indexable and investing in make target to actually produce these docs with ToC is different matter. I do not think people who actually spent time on our docs had enough inclination to do so themselves, but that should not prevent you or other like-minded people from leading by example.
On Wed, Sep 26, 2018 at 1:44 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > > And if we were going to generate something external, would it make more > > sense to write in a structured format like doxygen? I am not a big fan > > of it myself, but at least from there you can generate a more richly > > interconnected set of documentation. > > It's useful to have a single authoritative source for all documentation > that's easy to search through. If that is the case I would propose to keep it all in header files and organize the headers. > That includes stuff like perl585delta(1) which we'd stick in > Documentation/RelNotes, and "Internals and C Language Interface". Most > of what we'd put in Documentation/technical/api-* & headers is in > perlapi(1). This seems cool, but was also a recent introduction? perl400delta seems to yield nothing for me (which may be because I do not have an old version of perl installed?) > > Sometimes it's obvious where to look, but as someone who uses most of > these APIs very occasionally (because I contribute less) I keep > (re-)discovering various internal APIs. Sometimes I have the same feeling. Maybe more structure in the source files would help (e.g. datastructures/{strbuf, string-list}.h and objects/{packfile.h, sha1*} ? > Every perl installation also ships perlapi and a MB or so of obscure > internal docs to everyone, which makes it easier to find via Google et > al, which I find useful. So I guess I'm more on the side fence of > dropping a hypothetical gitapi-oid-array into /usr/share/man than you > are. > Personally I would not want to ship our internal docs everywhere as it seems like overly wasteful. But then, only my early days of childhood were guided by Internet that is not available anywhere and everywhere. Today I would just take for granted that I can lookup things in github/git/git when I am in not at my regular desk. > ANYWAY > > This E-mail is much longer than I intended, sorry about that. I didn't > just mean to bitch about how we ship docs, but I was wondering if there > was a desire to move to something like what I've outlined above, or > whether the status quo was mostly by design and intended. > > I just thought I'd write this rather lengthy E-Mails because this is one > of the rare areas where you can fully describe an idea in E-Mail without > writing any patches. > > If the consensus is that something like the exhaustive index "perldoc > perl" provides wouldn't be useful for git I can just drop this, but if > people are enthusiastic about having that it would be useful to work on > it... I agree with Junio, as that the discoverability is unrelated to where to store the actual docs, Documentation/technical/* has the advantage that it only has "good" stuff, i.e. some topic that someone cared enough to write about and it is easy to guess if it is relevant to your need. In *.h, we have a lot of false positives (xdiff-interface.h or cache.h just pollute the searching space when looking for suitable storage classes.) So I wonder if we'd want to have a list (similar as in command-list.txt) that describes the header files and gives some classification of them, for example one class could be the data structures (strbuf, stringlist, hashmap), algorithms (diff, range-diff), OS specific stuff (run-command, trace, alloc) or Git specific things (apply, fetch, submodule)
On Wed, Sep 26, 2018 at 10:44:33PM +0200, Ævar Arnfjörð Bjarmason wrote: > My bias here is that I've also contributed actively to the perl project > in the past, and with that project you can get an overview of *all* of > the docs by typing: > > man perl > > That includes stuff like perl585delta(1) which we'd stick in > Documentation/RelNotes, and "Internals and C Language Interface". Most > of what we'd put in Documentation/technical/api-* & headers is in > perlapi(1). I like the perl documentation, too. But I think most of what you're talking about there is the public API, where you have many readers. But here we're talking about internal functions (albeit ones we hope to see reused across the codebase). My concern is mostly: is the work in maintaining this documentation-formatting system worth the number of readers it will get? There are two numbers there, and I'm guessing at both of them. ;) If you're interested in pulling documentation out of the header files and generating asciidoc from it, I'm happy to at least try keeping it up to date. When we started putting this information into header files, we used "/**" to start the comment, as a special marker to indicate it was worth pulling out. I don't know how well we've maintained that convention, but it's a starting point. > I spent an embarrassingly long time submitting patches to git before > discovering that Documentation/technical/api-*.txt even existed, I think > something like 1-2 years ago. That's another thing that I think is improved by keeping the documentation and code together. If you find one, you find the other. You just have to "somehow" find one, which is what you're getting at below. > We have very well documented stuff like strbuf.h (mostly thanks to you), > but how is such documentation supposed to be discovered? Something like: > > git grep -A20 '^/\*$' -- *.h > <hold in page down> > > ? > > In terms of getting an overview it's indistinguishable from > comments. I.e. there's nothing like an index of: > > man git-api-strbuf ==> working with strings > man git-api-sha1-array ==> list, iterate and binary lookup SHA1s I agree that is a problem, especially for contributors less familiar with the code base. But I think generating an index is a separate (and much easier) problem than formatting all of the documentation. We already have the /** convention I mentioned above. Could we have another micro-format like: /** API:strbuf - working with strings */ in each header file? That would make generating such an index pretty trivial. > I'm also not in the habit of opening the *.h file for something, I > usually just start reading the *.c and only later discover there's some > API docs in the relevant *.h (or not). Interesting. I'm not totally opposed to putting the documentation alongside the actual code. It does feel a bit cluttered to me, but I think you're right that it keeps everything as close together as possible. > It means you can't avoid seeing it or updating it when source > spelunking, and it leaves header files short, which is useful when you'd > like to get a general API overview without all the docs. Our documented > headers are quite fat, e.g. strbuf.h is 60% of the size of strbuf.c, but > less than 20% when you strip the docs. I don't use folds in my editor, and I guess nobody else in this thread does, either. But they may be a reasonable tool for "wow, there are comments, declarations, and definitions all together and I just want to view one of them". In vim, try "set foldmethod=syntax" and then "zc" to close the folds. I kind of hate it myself, but just another option to explore. > > I'm mildly negative on this, just because it introduces extra work on > > people writing the documentation. Now it has to follow special > > formatting rules, and sometimes the source is uglier (e.g., the horrible > > +-continuation in lists). Are authors now responsible for formatting any > > changes they make to make sure they look good in asciidoc? Or are people > > who care about the formatted output going to come along afterwards and > > submit fixup patches? Either way it seems like make-work. > > This part I'm slightly confused by. If you're just saying "let's > document stuff in *.h files in free-flowing text", fine. But if we're > talking about the difference between Documentation/technical/api-*.txt > and having the same stuff in manpages, then the stuff in api-*.txt *is* > already asciidoc, and we have a target to format it all up and ship it > with git, and distros ship that. Yes, the "free-flowing text" thing is more or less what I'm saying. I suspect what's in technical/* currently probably has formatting problems, and people update it (if we're lucky!) without regard to what the result looks like. > E.g. on Debian you can "apt install git-doc" and browse stuff like > file:///usr/share/doc/git-doc/technical/api-argv-array.html which is the > HTML formatted version, same for all the other api-*.txt docs. IMHO this is just silly. What am I as an end user going to do with api-argv-array.html? > This E-mail is much longer than I intended, sorry about that. I didn't > just mean to bitch about how we ship docs, but I was wondering if there > was a desire to move to something like what I've outlined above, or > whether the status quo was mostly by design and intended. I think we're actually half-way between status quos. The stuff in technical/api-* has existed for a long time. We've been slowly moving stuff over to header files, but there's still a bit of stuff there. Building those files as asciidoc is one of those things that just cropped up along the way. I have never thought it was useful myself, but somebody bothered to write the patches, so OK. The person who did so is not even somebody who has otherwise worked on the project, AFAICT. -Peff
Hi, Jeff King wrote: > On Wed, Sep 26, 2018 at 10:44:33PM +0200, Ævar Arnfjörð Bjarmason wrote: >> In terms of getting an overview it's indistinguishable from >> comments. I.e. there's nothing like an index of: >> >> man git-api-strbuf ==> working with strings >> man git-api-sha1-array ==> list, iterate and binary lookup SHA1s > > I agree that is a problem, especially for contributors less familiar > with the code base. But I think generating an index is a separate (and > much easier) problem than formatting all of the documentation. > > We already have the /** convention I mentioned above. Could we have > another micro-format like: > > /** API:strbuf - working with strings */ > > in each header file? That would make generating such an index pretty > trivial. Can you spell out the problem for me a little more? E.g. if we had a convention that the first comment in strbuf.h should say /* strbuf - Git's standard string type */ or even just /* Git's standard string type */ would that do the trick? >> I'm also not in the habit of opening the *.h file for something, I >> usually just start reading the *.c and only later discover there's some >> API docs in the relevant *.h (or not). > > Interesting. I'm not totally opposed to putting the documentation > alongside the actual code. It does feel a bit cluttered to me, but I > think you're right that it keeps everything as close together as > possible. I've experienced projects following both conventions. One thing I like a lot about documentation in the header file is that it encourages people to make the API documentation self-contained. That is, you describe the contract in a way that doesn't require reading the implementation for details. It took me a while to get used to this kind of convention but now I really like it. So I really do prefer to keep putting API documentation in the header files in Git. (Implementation documentation in the source files is of course also very welcome.) >> It means you can't avoid seeing it or updating it when source >> spelunking, and it leaves header files short, which is useful when you'd >> like to get a general API overview without all the docs. Our documented >> headers are quite fat, e.g. strbuf.h is 60% of the size of strbuf.c, but >> less than 20% when you strip the docs. > > I don't use folds in my editor, and I guess nobody else in this thread > does, either. But they may be a reasonable tool for "wow, there are > comments, declarations, and definitions all together and I just want to > view one of them". In vim, try "set foldmethod=syntax" and then "zc" to > close the folds. I use kythe to get an outline view of the header files. [...] >> E.g. on Debian you can "apt install git-doc" and browse stuff like >> file:///usr/share/doc/git-doc/technical/api-argv-array.html which is the >> HTML formatted version, same for all the other api-*.txt docs. > > IMHO this is just silly. What am I as an end user going to do with > api-argv-array.html? In Debian we just ship all the docs. For something like technical/pack-heuristics, it's quite nice. For the API docs, I think they belong in the headers. Thanks, Jonathan
On Wed, Sep 26, 2018 at 11:34:04PM -0700, Jonathan Nieder wrote: > > We already have the /** convention I mentioned above. Could we have > > another micro-format like: > > > > /** API:strbuf - working with strings */ > > > > in each header file? That would make generating such an index pretty > > trivial. > > Can you spell out the problem for me a little more? E.g. if we had a > convention that the first comment in strbuf.h should say > > /* strbuf - Git's standard string type */ > > or even just > > /* Git's standard string type */ > > would that do the trick? I assumed that not all header files would want to advertise themselves as a public API, so we'd want to some way to differentiate these from first comments in "other" header files (and I assumed we did not want a separate list of "these are the API files to go into the index", at which point you might as well just write "standard string type" in that list). But I'm happy with any convention that lets Ævar generate the index he wants. > > Interesting. I'm not totally opposed to putting the documentation > > alongside the actual code. It does feel a bit cluttered to me, but I > > think you're right that it keeps everything as close together as > > possible. > > I've experienced projects following both conventions. One thing I like > a lot about documentation in the header file is that it encourages > people to make the API documentation self-contained. That is, you > describe the contract in a way that doesn't require reading the > implementation for details. > > It took me a while to get used to this kind of convention but now I > really like it. So I really do prefer to keep putting API > documentation in the header files in Git. (Implementation > documentation in the source files is of course also very welcome.) Yeah, I'd agree with all of that. > I use kythe to get an outline view of the header files. Looks neat. Doesn't seem to be package for Debian, though. ;) > In Debian we just ship all the docs. For something like > technical/pack-heuristics, it's quite nice. For the API docs, I think > they belong in the headers. Yes, I'd agree with that. About half of technical/* is design docs, etc, that might be of use to random folks. But the internal code APIs are really only useful if you have an actual clone of git.git. -Peff
On Wed, Sep 26, 2018 at 02:58:12PM -0400, Jeff King wrote: > And if we were going to generate something external, would it make more > sense to write in a structured format like doxygen? I am not a big fan > of it myself, but at least from there you can generate a more richly > interconnected set of documentation. By the way, I tried running Doxygen on the current code-base. I _still_ hate the presentation of it (mostly because it's jarring to jump out of my terminal to a web browser). But it actually does an OK job of showing strbuf.h, with our overview attached to the struct and individual functions correctly documented. There are a couple of formatting hiccups, but it's actually pretty close. -Peff
On Wed, Sep 26 2018, Stefan Beller wrote: > On Wed, Sep 26, 2018 at 1:44 PM Ævar Arnfjörð Bjarmason > <avarab@gmail.com> wrote: > >> > And if we were going to generate something external, would it make more >> > sense to write in a structured format like doxygen? I am not a big fan >> > of it myself, but at least from there you can generate a more richly >> > interconnected set of documentation. >> >> It's useful to have a single authoritative source for all documentation >> that's easy to search through. > > If that is the case I would propose to keep it all in header files and > organize the headers. > >> That includes stuff like perl585delta(1) which we'd stick in >> Documentation/RelNotes, and "Internals and C Language Interface". Most >> of what we'd put in Documentation/technical/api-* & headers is in >> perlapi(1). > > This seems cool, but was also a recent introduction? > perl400delta seems to yield nothing for me (which may be because > I do not have an old version of perl installed?) Depends on what you think is "recent" I suppose. Perl 5.4 is the first version where deltas in POD format started being maintained consistently, that version was released in mid-1997. Perl 4 was released in 1991, see "perldoc perlhist". So ~everything consistently in POD has been the case for ~20 years.
Jeff King <peff@peff.net> writes: > If you're interested in pulling documentation out of the header files > and generating asciidoc from it, I'm happy to at least try keeping it up > to date. When we started putting this information into header files, we > used "/**" to start the comment, as a special marker to indicate it was > worth pulling out. I don't know how well we've maintained that > convention, but it's a starting point. I noticed some people add these extra asterisk at the beginning of comment, but I do not recall that we declared it is a convention we adopt, so I'd rather be surprised if we've "maintained" it. Please have it in CodingGuidelines or somewhere once this thread settles and we decide to keep that convention (I have no problem with the convention; it is just I personally didn't think it was worth doing myself at least until now that we might have found somebody who wants to make use of the markings).
On Thu, Sep 27, 2018 at 10:36:11AM -0700, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > If you're interested in pulling documentation out of the header files > > and generating asciidoc from it, I'm happy to at least try keeping it up > > to date. When we started putting this information into header files, we > > used "/**" to start the comment, as a special marker to indicate it was > > worth pulling out. I don't know how well we've maintained that > > convention, but it's a starting point. > > I noticed some people add these extra asterisk at the beginning of > comment, but I do not recall that we declared it is a convention we > adopt, so I'd rather be surprised if we've "maintained" it. True. _I_ declared it as a convention and used it when I migrated some of the initial api-* documents, but I don't know if anybody else followed that lead. FWIW, it is not my own convention, but one used by other tools like doxygen (which in turn got it from javadoc, I think). > Please have it in CodingGuidelines or somewhere once this thread > settles and we decide to keep that convention (I have no problem > with the convention; it is just I personally didn't think it was > worth doing myself at least until now that we might have found > somebody who wants to make use of the markings). Yeah, this is basically why I hadn't pushed harder for it. If nobody is actually extracting based on the convention, there is not much point. So I was waiting for somebody to show up with an interest in using an extraction tool. -Peff
diff --git a/sha1-array.c b/sha1-array.c index b94e0ec0f5e..d922e94e3fc 100644 --- a/sha1-array.c +++ b/sha1-array.c @@ -77,3 +77,20 @@ int oid_array_for_each_unique(struct oid_array *array, } return 0; } + +void oid_array_filter(struct oid_array *array, + for_each_oid_fn want, + void *cb_data) +{ + unsigned nr = array->nr, src, dst; + struct object_id *oids = array->oid; + + for (src = dst = 0; src < nr; src++) { + if (want(&oids[src], cb_data)) { + if (src != dst) + oidcpy(&oids[dst], &oids[src]); + dst++; + } + } + array->nr = dst; +} diff --git a/sha1-array.h b/sha1-array.h index 232bf950172..275e5b02f5e 100644 --- a/sha1-array.h +++ b/sha1-array.h @@ -23,4 +23,13 @@ int oid_array_for_each_unique(struct oid_array *array, for_each_oid_fn fn, void *data); +/* + * Apply want to each entry in array, retaining only the entries for + * which the function returns true. Preserve the order of the entries + * that are retained. + */ +void oid_array_filter(struct oid_array *array, + for_each_oid_fn want, + void *cbdata); + #endif /* SHA1_ARRAY_H */
Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Stefan Beller <sbeller@google.com> --- sha1-array.c | 17 +++++++++++++++++ sha1-array.h | 9 +++++++++ 2 files changed, 26 insertions(+)