[00/25] Documentation fixes

Message ID	pull.1595.git.1696747527.gitgitgadget@gmail.com (mailing list archive)
Headers	show Return-Path: <git-owner@vger.kernel.org> Message-ID: <pull.1595.git.1696747527.gitgitgadget@gmail.com> From: "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> Date: Sun, 08 Oct 2023 06:45:02 +0000 Subject: [PATCH 00/25] Documentation fixes Fcc: Sent Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit MIME-Version: 1.0 To: git@vger.kernel.org Cc: Elijah Newren <newren@gmail.com> Precedence: bulk
Series	Documentation fixes \| expand [00/25] Documentation fixes [01/25] documentation: wording improvements [02/25] documentation: fix small error [03/25] documentation: fix typos [04/25] documentation: fix apostrophe usage [05/25] documentation: add missing words [06/25] documentation: remove extraneous words [07/25] documentation: fix subject/verb agreement [08/25] documentation: employ consistent verb tense for a list [09/25] documentation: fix verb tense [10/25] documentation: fix adjective vs. noun [11/25] documentation: fix verb vs. noun [12/25] documentation: fix singular vs. plural [13/25] documentation: whitespace is already generally plural [14/25] documentation: fix choice of article [15/25] documentation: add missing article [16/25] documentation: remove unnecessary hyphens [17/25] documentation: add missing hyphens [18/25] documentation: use clearer prepositions [19/25] documentation: fix punctuation [20/25] documentation: fix capitalization [21/25] documentation: fix whitespace issues [22/25] documentation: add some commas where they are helpful [23/25] documentation: add missing fullstops [24/25] documentation: add missing quotes [25/25] documentation: add missing parenthesis

Philippe Blain via GitGitGadget Oct. 8, 2023, 6:45 a.m. UTC

It turns out that AI is pretty good at making small fixes to documentation;
certainly not perfect, but it provides quite good signal. Unfortunately,
there is a lot to sift through. Some points about my strategy:

 * I ignored a few categories of things like British vs. American spellings
   (though being consistent on that might be a nice change)

 * I dropped many changes where I thought there wasn't an issue being
   corrected but just a switch of style being suggested (though I accepted a
   number of these types of changes before later deciding to drop them).

 * I also started discarding lower priority changes like comma placement. I
   accepted a number of these that I thought made the documentation clearer,
   but I eventually just started dropping them as I had spent far more hours
   than I expected on this series.

 * A few of the changes AI suggested were bad (it really shouldn't mess with
   protocol and RFC-like text). Definitely was useful to review.

 * Occasionally I noticed an even better improvement and tweaked the text
   accordingly.

I did review every single change here, multiple times, and I have tried to
split up this series in a way to make it easier to review. In particular I:

 * dropped any changes that conflicted with anything in next or seen. I may
   resubmit some of those later.

 * split it into a bunch of patches based on various grammatical rules being
   corrected

 * sometimes when a single line had multiple types of changes, I split the
   fixes across multiple patches in order to group types of changes

 * sometimes splitting the single-line changes seemed like too much, and I
   just combined the multiple changes to a single line with the first patch
   in the series, "wording improvements".

Let me know if there are other changes that would make this easier to
review. In a sense, though, it's already had multiple eyes looking at this,
it's just that one pair of those eyes were artificial. :-)

(Note: every patch in this series, except for the whitespace fixes patch,
are best viewed with --color-words.)

Elijah Newren (25):
  documentation: wording improvements
  documentation: fix small error
  documentation: fix typos
  documentation: fix apostrophe usage
  documentation: add missing words
  documentation: remove extraneous words
  documentation: fix subject/verb agreement
  documentation: employ consistent verb tense for a list
  documentation: fix verb tense
  documentation: fix adjective vs. noun
  documentation: fix verb vs. noun
  documentation: fix singular vs. plural
  documentation: whitespace is already generally plural
  documentation: fix choice of article
  documentation: add missing article
  documentation: remove unnecessary hyphens
  documentation: add missing hyphens
  documentation: use clearer prepositions
  documentation: fix punctuation
  documentation: fix capitalization
  documentation: fix whitespace issues
  documentation: add some commas where they are helpful
  documentation: add missing fullstops
  documentation: add missing quotes
  documentation: add missing parenthesis

 Documentation/CodingGuidelines                |  6 ++---
 Documentation/ReviewingGuidelines.txt         |  4 +--
 Documentation/SubmittingPatches               |  2 +-
 Documentation/ToolsForGit.txt                 |  4 +--
 Documentation/config.txt                      |  6 ++---
 Documentation/config/advice.txt               |  8 +++---
 Documentation/config/alias.txt                |  2 +-
 Documentation/config/apply.txt                |  4 +--
 Documentation/config/branch.txt               |  6 ++---
 Documentation/config/checkout.txt             |  4 +--
 Documentation/config/clean.txt                |  2 +-
 Documentation/config/clone.txt                |  4 +--
 Documentation/config/color.txt                |  2 +-
 Documentation/config/column.txt               |  4 +--
 Documentation/config/commit.txt               |  4 +--
 Documentation/config/credential.txt           |  4 +--
 Documentation/config/diff.txt                 |  2 +-
 Documentation/config/fastimport.txt           |  4 +--
 Documentation/config/fetch.txt                |  4 +--
 Documentation/config/format.txt               |  2 +-
 Documentation/config/fsck.txt                 | 22 ++++++++--------
 Documentation/config/fsmonitor--daemon.txt    |  2 +-
 Documentation/config/gc.txt                   |  4 +--
 Documentation/config/gpg.txt                  |  6 ++---
 Documentation/config/gui.txt                  |  2 +-
 Documentation/config/http.txt                 |  4 +--
 Documentation/config/i18n.txt                 |  2 +-
 Documentation/config/imap.txt                 |  4 +--
 Documentation/config/index.txt                |  2 +-
 Documentation/config/log.txt                  |  2 +-
 Documentation/config/mailinfo.txt             |  2 +-
 Documentation/config/maintenance.txt          |  2 +-
 Documentation/config/man.txt                  |  2 +-
 Documentation/config/merge.txt                |  2 +-
 Documentation/config/mergetool.txt            | 12 ++++-----
 Documentation/config/notes.txt                |  2 +-
 Documentation/config/pack.txt                 | 10 +++----
 Documentation/config/push.txt                 |  4 +--
 Documentation/config/receive.txt              |  4 +--
 Documentation/config/rerere.txt               |  2 +-
 Documentation/config/safe.txt                 |  4 +--
 Documentation/config/sendemail.txt            |  4 +--
 Documentation/config/sequencer.txt            |  2 +-
 Documentation/config/splitindex.txt           |  6 ++---
 Documentation/config/stash.txt                |  8 +++---
 Documentation/config/status.txt               |  4 +--
 Documentation/config/submodule.txt            |  4 +--
 Documentation/config/trace2.txt               |  2 +-
 Documentation/config/transfer.txt             |  4 +--
 Documentation/config/user.txt                 | 10 +++----
 Documentation/config/versionsort.txt          |  6 ++---
 Documentation/diff-generate-patch.txt         | 26 +++++++++----------
 Documentation/diff-options.txt                |  2 +-
 Documentation/fetch-options.txt               |  4 +--
 Documentation/fsck-msgids.txt                 |  4 +--
 Documentation/git-am.txt                      | 12 ++++-----
 Documentation/git-apply.txt                   | 14 +++++-----
 Documentation/git-archive.txt                 | 16 ++++++------
 Documentation/git-blame.txt                   |  8 +++---
 Documentation/git-bugreport.txt               |  9 ++++---
 Documentation/git-check-attr.txt              |  6 ++---
 Documentation/git-check-ignore.txt            |  2 +-
 Documentation/git-check-ref-format.txt        |  4 +--
 Documentation/git-checkout-index.txt          | 10 +++----
 Documentation/git-checkout.txt                |  2 +-
 Documentation/git-clean.txt                   |  2 +-
 Documentation/git-count-objects.txt           |  6 ++---
 Documentation/git-credential-cache.txt        |  2 +-
 Documentation/git-credential-store.txt        |  2 +-
 Documentation/git-credential.txt              |  2 +-
 Documentation/git-daemon.txt                  |  2 +-
 Documentation/git-diff-files.txt              |  6 ++---
 Documentation/git-diff-index.txt              |  4 +--
 Documentation/git-diff-tree.txt               | 12 ++++-----
 Documentation/git-difftool.txt                |  4 +--
 Documentation/git-fast-import.txt             |  4 +--
 Documentation/git-fetch-pack.txt              |  2 +-
 Documentation/git-format-patch.txt            |  2 +-
 Documentation/git-fsck.txt                    |  6 ++---
 Documentation/git-fsmonitor--daemon.txt       | 10 +++----
 Documentation/git-get-tar-commit-id.txt       |  2 +-
 Documentation/git-grep.txt                    |  2 +-
 Documentation/git-hash-object.txt             |  8 +++---
 Documentation/git-help.txt                    | 18 ++++++-------
 Documentation/git-hook.txt                    |  4 +--
 Documentation/git-http-backend.txt            | 10 +++----
 Documentation/git-http-fetch.txt              |  2 +-
 Documentation/git-http-push.txt               | 10 +++----
 Documentation/git-index-pack.txt              | 10 +++----
 Documentation/git-init.txt                    | 10 +++----
 Documentation/git-ls-files.txt                | 10 +++----
 Documentation/git-mailsplit.txt               |  2 +-
 Documentation/git-maintenance.txt             |  6 ++---
 Documentation/git-merge-base.txt              | 12 ++++-----
 Documentation/git-merge-tree.txt              |  8 +++---
 Documentation/git-merge.txt                   |  2 +-
 Documentation/git-mergetool--lib.txt          | 10 +++----
 Documentation/git-mergetool.txt               |  8 +++---
 Documentation/git-mktag.txt                   |  6 ++---
 Documentation/git-mktree.txt                  |  4 +--
 Documentation/git-mv.txt                      |  2 +-
 Documentation/git-name-rev.txt                |  2 +-
 Documentation/git-prune-packed.txt            |  2 +-
 Documentation/git-prune.txt                   |  2 +-
 Documentation/git-push.txt                    |  2 +-
 Documentation/git-quiltimport.txt             |  4 +--
 Documentation/git-range-diff.txt              |  2 +-
 Documentation/git-read-tree.txt               |  6 ++---
 Documentation/git-receive-pack.txt            |  4 +--
 Documentation/git-remote-ext.txt              | 10 +++----
 Documentation/git-remote-fd.txt               | 10 +++----
 Documentation/git-repack.txt                  |  4 +--
 Documentation/git-replace.txt                 |  4 +--
 Documentation/git-request-pull.txt            |  4 +--
 Documentation/git-restore.txt                 |  4 +--
 Documentation/git-rev-list.txt                |  4 +--
 Documentation/git-rev-parse.txt               |  8 +++---
 Documentation/git-rm.txt                      |  2 +-
 Documentation/git-send-email.txt              |  4 +--
 Documentation/git-send-pack.txt               | 16 ++++++------
 Documentation/git-sh-setup.txt                |  2 +-
 Documentation/git-show-branch.txt             | 12 ++++-----
 Documentation/git-show-ref.txt                |  2 +-
 Documentation/git-show.txt                    |  2 +-
 Documentation/git-status.txt                  |  2 +-
 Documentation/git-stripspace.txt              |  6 ++---
 Documentation/git-symbolic-ref.txt            |  2 +-
 Documentation/git-update-index.txt            | 18 ++++++-------
 Documentation/git-update-ref.txt              |  2 +-
 Documentation/git-update-server-info.txt      |  4 +--
 Documentation/git-upload-pack.txt             |  2 +-
 Documentation/git-var.txt                     |  2 +-
 Documentation/git-verify-pack.txt             |  6 ++---
 Documentation/git-whatchanged.txt             |  8 +++---
 Documentation/gitcli.txt                      |  8 +++---
 Documentation/gitdiffcore.txt                 | 14 +++++-----
 Documentation/giteveryday.txt                 |  2 +-
 Documentation/gitformat-bundle.txt            |  8 +++---
 Documentation/gitformat-chunk.txt             |  4 +--
 Documentation/gitformat-pack.txt              | 18 ++++++-------
 Documentation/githooks.txt                    | 10 +++----
 Documentation/gitprotocol-capabilities.txt    | 20 +++++++-------
 Documentation/gitprotocol-common.txt          |  2 +-
 Documentation/gitprotocol-http.txt            |  6 ++---
 Documentation/gitprotocol-pack.txt            |  6 ++---
 Documentation/gitprotocol-v2.txt              |  2 +-
 Documentation/gitsubmodules.txt               |  6 ++---
 Documentation/gitweb.conf.txt                 |  2 +-
 Documentation/gitweb.txt                      | 16 ++++++------
 Documentation/glossary-content.txt            |  2 +-
 .../howto/coordinate-embargoed-releases.txt   |  2 +-
 Documentation/howto/maintain-git.txt          |  6 ++---
 Documentation/howto/use-git-daemon.txt        |  2 +-
 Documentation/howto/using-merge-subtree.txt   |  2 +-
 Documentation/i18n.txt                        |  4 +--
 Documentation/mergetools/vimdiff.txt          |  6 ++---
 Documentation/pretty-options.txt              |  4 +--
 Documentation/pull-fetch-param.txt            |  6 ++---
 Documentation/rev-list-options.txt            |  4 +--
 Documentation/technical/api-index-skel.txt    |  2 +-
 Documentation/technical/api-simple-ipc.txt    | 10 +++----
 Documentation/technical/bitmap-format.txt     |  6 ++---
 Documentation/technical/commit-graph.txt      |  2 +-
 Documentation/technical/parallel-checkout.txt | 10 +++----
 Documentation/technical/partial-clone.txt     |  8 +++---
 Documentation/technical/racy-git.txt          | 10 +++----
 Documentation/technical/reftable.txt          | 10 +++----
 .../technical/repository-version.txt          |  2 +-
 Documentation/technical/rerere.txt            |  6 ++---
 Documentation/urls-remotes.txt                |  4 +--
 Documentation/urls.txt                        |  4 +--
 171 files changed, 478 insertions(+), 477 deletions(-)


base-commit: 3a06386e314565108ad56a9bdb8f7b80ac52fb69
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1595%2Fnewren%2Fdoc-fixes-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1595/newren/doc-fixes-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1595

Taylor Blau Oct. 9, 2023, 1:44 a.m. UTC | #1

On Sun, Oct 08, 2023 at 06:45:02AM +0000, Elijah Newren via GitGitGadget wrote:
> It turns out that AI is pretty good at making small fixes to documentation;
> certainly not perfect, but it provides quite good signal. Unfortunately,
> there is a lot to sift through. Some points about my strategy:

Quite interesting ;-).

I'm curious to learn a little bit more about your
strategy beyond what you wrote:

  - What tool did you use? ChatGPT? Something home-grown?
  - (Assuming this was generated by some sort of LLM): what did you
    prompt it with?
  - What was the output format: the edited text in its entirety, or a
    patch that can be applied on top?

Thanks,
Taylor

Elijah Newren Oct. 9, 2023, 4:46 p.m. UTC | #2

On Sun, Oct 8, 2023 at 6:44 PM Taylor Blau <me@ttaylorr.com> wrote:
>
> On Sun, Oct 08, 2023 at 06:45:02AM +0000, Elijah Newren via GitGitGadget wrote:
> > It turns out that AI is pretty good at making small fixes to documentation;
> > certainly not perfect, but it provides quite good signal. Unfortunately,
> > there is a lot to sift through. Some points about my strategy:
>
> Quite interesting ;-).
>
> I'm curious to learn a little bit more about your
> strategy beyond what you wrote:
>
>   - What tool did you use? ChatGPT? Something home-grown?

A mixture of gpt-4 and gpt-4-32k (I would have just used gpt-4, but
trying to give it a full file blows the token limit on several of
Git's documentation files).

Also, it was sent to an internally hosted instance.  On this internal
instance, it seemed to require passing the
api-version=2023-03-15-preview parameter.  I don't really know what
that parameter means, but I suspect it might have been some
6-months-ish old version of gpt-4?

>   - (Assuming this was generated by some sort of LLM): what did you
>     prompt it with?

Note that it was exactly one file per prompt, which was as follows:

"""
For the asciidoc file below, are there any typos, grammatical errors,
or wording problems?  If so, please highlight them along with proposed
corrections:

--------------------
${FILE_CONTENTS}
"""

If I had to do it over, I'd be much more explicit about the output
format.  Probably, "Please respond by outputting the full file, with
any corrections included.  If there are no corrections, simply output
the original file as-is." which would allow me to simply diff the
output and look at the changes.

Also, I would probably specify that "The ascii doc file starts three
lines below, just after the line of dashes", hoping that would help it
avoid sometimes presuming that the dashes were part of the file.

>   - What was the output format: the edited text in its entirety, or a
>     patch that can be applied on top?

My wording was unfortunately vague, so I sometimes got human prose
instructing me with a change to make, sometimes I got a bulleted list
in the form "${old_text} -> ${new_text}", but most of the time it
printed the file (or a subset thereof) with corrections.  I also had
all the output concatenated into one large file, which made it "fun"
to work through all the changes.  Even when diffing files, I manually
applied any changes I saw to the actual file (which did risk
introducing new typos, and missing some of the corrections, but did
ensure I reviewed everything).

Also, not only did I get different output formats, but there were many
times the file was cut off at some point.  I sometimes assumed that
just meant there were no changes outside that region, but there were
times where there was only one change and it had given me hundreds of
lines of context around it before it cut off, so it did leave me with
the feeling it might have only processed or responded to part of the
file.

There were also several times where the changes it suggested were a
no-op, making me wonder if it just failed or something -- I looked at
it really closely (including sometimes piping the output through xxd,
and thus once noticed a change of tab-after-period to
space-after-period), but when it was responding with human prose and
said something like "Change the sentence that reads '${old_version}'
-> '${old_version}', it made me wonder if something just went haywire
with the LLM and I should retry.

However, despite the above issues making me think there are more
documentation issues to be found with an LLM, I didn't re-check any
files unless I got an error with no output (e.g. excessive number of
tokens, or I've hit rate limits on using the API).  I didn't bother,
because the firehose of changes it provided me even without those
caveats was far more than enough to deal with.

Junio C Hamano Oct. 16, 2023, 9:54 p.m. UTC | #3

"Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:

> I did review every single change here, multiple times, and I have tried to
> split up this series in a way to make it easier to review. In particular I:
> ...
> (Note: every patch in this series, except for the whitespace fixes patch,
> are best viewed with --color-words.)

I didn't think of anything clever, so ended up reading a bit by bit
over days.  I didn't find anything glaringly wrong ;-)

Let's mark it for 'next'.

Thanks.

[00/25] Documentation fixes

Message

Comments