[2/3] t1405: mark test that checks existence as REFFILES

Message ID	1ded69d89709d23147b29f67122b659293414405.1643651420.git.gitgitgadget@gmail.com (mailing list archive)
State	Accepted
Commit	53af25e47c5f1261c7da47ca4e01ed9d9f3ffcab
Headers	show Return-Path: <git-owner@kernel.org> Message-Id: <1ded69d89709d23147b29f67122b659293414405.1643651420.git.gitgitgadget@gmail.com> In-Reply-To: <pull.1209.git.git.1643651420.gitgitgadget@gmail.com> References: <pull.1209.git.git.1643651420.gitgitgadget@gmail.com> Date: Mon, 31 Jan 2022 17:50:19 +0000 Subject: [PATCH 2/3] t1405: mark test that checks existence as REFFILES Fcc: Sent Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit MIME-Version: 1.0 To: git@vger.kernel.org Cc: Han-Wen Nienhuys <hanwenn@gmail.com>, Han-Wen Nienhuys <hanwen@google.com> Precedence: bulk From: Han-Wen Nienhuys <hanwen@google.com>
Series	reftable related test tweaks \| expand [0/3] reftable related test tweaks [1/3] t1405: explictly delete reflogs for reftable [2/3] t1405: mark test that checks existence as REFFILES [3/3] t5312: prepare for reftable

Han-Wen Nienhuys Jan. 31, 2022, 5:50 p.m. UTC

From: Han-Wen Nienhuys <hanwen@google.com>

The reftable backend doesn't support mere existence of reflogs.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 t/t1405-main-ref-store.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Taylor Blau Jan. 31, 2022, 9:26 p.m. UTC | #1

On Mon, Jan 31, 2022 at 05:50:19PM +0000, Han-Wen Nienhuys via GitGitGadget wrote:
> From: Han-Wen Nienhuys <hanwen@google.com>
>
> The reftable backend doesn't support mere existence of reflogs.

Perhaps I'm missing something obvious, but this and the previous patch
seem to be conflicting each other.

My understanding of the previous change is that you wanted a reflog
entry when the REFFILES prerequisite isn't met. But this patch says what
matches my understanding is that reftable and reflogs do not play
together.

If reflogs do not interact with the reftable backend, then what does
this patch do?

Thanks,
Taylor

Junio C Hamano Jan. 31, 2022, 10:15 p.m. UTC | #2

Taylor Blau <me@ttaylorr.com> writes:

> On Mon, Jan 31, 2022 at 05:50:19PM +0000, Han-Wen Nienhuys via GitGitGadget wrote:
>> From: Han-Wen Nienhuys <hanwen@google.com>
>>
>> The reftable backend doesn't support mere existence of reflogs.
>
> Perhaps I'm missing something obvious, but this and the previous patch
> seem to be conflicting each other.
>
> My understanding of the previous change is that you wanted a reflog
> entry when the REFFILES prerequisite isn't met. But this patch says what
> matches my understanding is that reftable and reflogs do not play
> together.
>
> If reflogs do not interact with the reftable backend, then what does
> this patch do?

One difference between the files and the reftable backend is that
with the files backend, you can say "I am not adding any entry yet,
but remember that reflog is enabled for this ref, while all other
refs reflog is not enabled", and the way to do so is to touch the
"$GIT_DIR/logs/refs/heads/frotz" file---this enables reflog for the
"frotz" branch, even if core.logAllRefUpdates is not set.

Because there is no generic reflog API that says "enable log for
this ref", a test that checks this feature with files backend would
do "touch .git/refs/heads/frotz".

I didn't look at "this patch", but it is understandable if such a
test needs to be skipped via REFFILES prerequisite, because the
reftable backend lacks this feature.

Han-Wen Nienhuys Feb. 1, 2022, 8:06 p.m. UTC | #3

On Mon, Jan 31, 2022 at 11:15 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Taylor Blau <me@ttaylorr.com> writes:
>
> > On Mon, Jan 31, 2022 at 05:50:19PM +0000, Han-Wen Nienhuys via GitGitGadget wrote:
> >> From: Han-Wen Nienhuys <hanwen@google.com>
> >>
> >> The reftable backend doesn't support mere existence of reflogs.
> >
> > Perhaps I'm missing something obvious, but this and the previous patch
> > seem to be conflicting each other.
> >
> > My understanding of the previous change is that you wanted a reflog
> > entry when the REFFILES prerequisite isn't met. But this patch says what
> > matches my understanding is that reftable and reflogs do not play
> > together.
> >
> > If reflogs do not interact with the reftable backend, then what does
> > this patch do?
>
> One difference between the files and the reftable backend is that
> with the files backend, you can say "I am not adding any entry yet,
> but remember that reflog is enabled for this ref, while all other
> refs reflog is not enabled", and the way to do so is to touch the
> "$GIT_DIR/logs/refs/heads/frotz" file---this enables reflog for the
> "frotz" branch, even if core.logAllRefUpdates is not set.
>
> Because there is no generic reflog API that says "enable log for
> this ref", a test that checks this feature with files backend would
> do "touch .git/refs/heads/frotz".

There is refs_create_reflog(), so the generic reflog API exists. The
problem is that there is no sensible way to implement it in reftable.

One option is (reflog exists == there exists at least one reflog entry
for the ref). This messes up the test from this patch, because it
creates a reflog, but because it doesn't populate the reflog, so we
return false for git-reflog-exists.

It also turns out to mess up the tests in t3420, as follows:

++ git stash show -p
error: refs/stash@{0} is not a valid reference

I get

  reflog_exists: refs/stash: 0

and "git stash show -p" aborts with "error: refs/stash@{0} is not a
valid reference".

So I now went with the other option, ie. (reflog exists == true), ie.
every conceivable ref has a reflog (but most are empty). This makes
t3420 pass.

This behavior also confuses t1405, because in

  $RUN delete-reflog HEAD &&
  test_must_fail git reflog exists HEAD

the last command now always returns true.

Junio C Hamano Feb. 1, 2022, 9:03 p.m. UTC | #4

Han-Wen Nienhuys <hanwen@google.com> writes:

>> Because there is no generic reflog API that says "enable log for
>> this ref", a test that checks this feature with files backend would
>> do "touch .git/refs/heads/frotz".
>
> There is refs_create_reflog(), so the generic reflog API exists. The
> problem is that there is no sensible way to implement it in reftable.

Ah, yes, that's correct.

> One option is (reflog exists == there exists at least one reflog entry
> for the ref).

Because the current callers of refs_create_reflog() does want a
reflog created that does not give any entry when iterated, I agree
with you that adding a "fake" reflog entry alone is not a sufficient
emulation of the API.  I think these are all ...

> This messes up the test from this patch, because it
> creates a reflog, but because it doesn't populate the reflog, so we
> return false for git-reflog-exists.
>
> It also turns out to mess up the tests in t3420, as follows:
>
> ++ git stash show -p
> error: refs/stash@{0} is not a valid reference
>
> I get
>
>   reflog_exists: refs/stash: 0
>
> and "git stash show -p" aborts with "error: refs/stash@{0} is not a
> valid reference".

... indications of hat.

I wonder if it is simple and easy to add a new reflog entry type
used as an implementation detail of the reftable.  If we can do so,
then, the reftable backend integrated to the ref API can do these
things:

 - reflog_exists() can say yes when one reflog entry of any type
   (internal to the reftable implementation) exists for the ref;

 - create_reflog() can add a reflog entry of the "fake" type
   (internal to the reftable implementation);

 - for_each_reflog_ent() and its reverse can learn to skip such a
   fake reflog entry.

As there is no way to ask, via the API, the number of the existing
reflog entries, the ref API callers would not be able to tell such
an implementation detail that with reftable backend, create_reflog()
does not create an empty reflog.  To them, a reflog created with the
API call would truly be empty as iterators will not return anything.

Or do we have a list of refs kept somewhere in the reftable data
structure in a separate chunk?  Do we have a bit for each of these
refs to record if the log is enabled for it?  Then instead of the
fake reflog entry, we could implement the necessary semantics a lot
more cleanly:

 - reflog_exists() can just peek the bit.

 - create_reflog() can just flip the bit.

 - there is no need to touch the iterators.

 - the equivalent to files_log_ref_write() can decide based on the
   bit (i.e. what reflog_exists() says) whether to log changes to
   the ref.

It is probably a lot more sensible to fail refs_create_reflog() and
safe_create_reflog() (which is a thin wrapper around the former), if
we cannot implement "a reflog can exist and have no entries yet"
semantics.

Outside the test helper, the only place the helper is used is
"checkout -l" when should_autocreate_reflog() returns false, which
should be rare (as we are updating a branch, it should be either a
detached HEAD or a ref under refs/heads/), so it would not be a huge
practical downside if we cannot prepare an empty reflog anyway, I
would think.

Thanks.

Ævar Arnfjörð Bjarmason Feb. 1, 2022, 9:22 p.m. UTC | #5

On Tue, Feb 01 2022, Junio C Hamano wrote:

> Han-Wen Nienhuys <hanwen@google.com> writes:
>
>>> Because there is no generic reflog API that says "enable log for
>>> this ref", a test that checks this feature with files backend would
>>> do "touch .git/refs/heads/frotz".
>>
>> There is refs_create_reflog(), so the generic reflog API exists. The
>> problem is that there is no sensible way to implement it in reftable.
>
> Ah, yes, that's correct.
>
>> One option is (reflog exists == there exists at least one reflog entry
>> for the ref).
>
> Because the current callers of refs_create_reflog() does want a
> reflog created that does not give any entry when iterated, I agree
> with you that adding a "fake" reflog entry alone is not a sufficient
> emulation of the API.  I think these are all ...
>
>> This messes up the test from this patch, because it
>> creates a reflog, but because it doesn't populate the reflog, so we
>> return false for git-reflog-exists.
>>
>> It also turns out to mess up the tests in t3420, as follows:
>>
>> ++ git stash show -p
>> error: refs/stash@{0} is not a valid reference
>>
>> I get
>>
>>   reflog_exists: refs/stash: 0
>>
>> and "git stash show -p" aborts with "error: refs/stash@{0} is not a
>> valid reference".
>
> ... indications of hat.
>
> I wonder if it is simple and easy to add a new reflog entry type
> used as an implementation detail of the reftable.  If we can do so,
> then, the reftable backend integrated to the ref API can do these
> things:
>
>  - reflog_exists() can say yes when one reflog entry of any type
>    (internal to the reftable implementation) exists for the ref;
>
>  - create_reflog() can add a reflog entry of the "fake" type
>    (internal to the reftable implementation);
>
>  - for_each_reflog_ent() and its reverse can learn to skip such a
>    fake reflog entry.
>
> As there is no way to ask, via the API, the number of the existing
> reflog entries, the ref API callers would not be able to tell such
> an implementation detail that with reftable backend, create_reflog()
> does not create an empty reflog.  To them, a reflog created with the
> API call would truly be empty as iterators will not return anything.

We could surely add magic record types, but how would such a dance be
performed while keeping compatibility with existing JGit clients?

Junio C Hamano Feb. 1, 2022, 10:11 p.m. UTC | #6

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> We could surely add magic record types, but how would such a dance be
> performed while keeping compatibility with existing JGit clients?

Yes.  It is exactly the point of the question I asked.  If it is
simple and easy to add such a new type that is ignored/skipped by
existing clients, then we can go that route.  If it is simple and
easy to add a new bit per ref that existing clients would not barf,
we can use that as an alternative implementation strategy.

And if neither is possible, and there is no other viable third way,
then what I wrote in the part you omitted from your quote still
stands, which was:

>> It is probably a lot more sensible to fail refs_create_reflog() and
>> safe_create_reflog() (which is a thin wrapper around the former), if
>> we cannot implement "a reflog can exist and have no entries yet"
>> semantics.

Han-Wen Nienhuys Feb. 3, 2022, 4:02 p.m. UTC | #7

On Tue, Feb 1, 2022 at 11:12 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>
> > We could surely add magic record types, but how would such a dance be
> > performed while keeping compatibility with existing JGit clients?
>
> Yes.  It is exactly the point of the question I asked.  If it is
> simple and easy to add such a new type that is ignored/skipped by
> existing clients, then we can go that route.  If it is simple and
> easy to add a new bit per ref that existing clients would not barf,
> we can use that as an alternative implementation strategy.

I'm not sure that there are any JGit clients: I committed reftable
support at the end of 2019. Before that time, we were running it
internally at Google, but only ref storage, and without the posix
part. Reflogs were never stored in refable, and I actually found a
couple of bugs in Shawn's Java code.

Gerrit has increasingly started using Git as a database, and the
packed/loose system is just not a very good database, so that
motivates the work reftable in general. But the folks who run Gerrit
on a POSIX filesystem want to be sure that isn't a fringe feature, so
they only want to start using it once Git itself supports it. So there
is a chicken & egg problem.

It's sad that we have to introduce an existence bit to make things
work, but overall it is probably easier for me to do than trying to
make sense of sequencer.c and how it uses refs/stash@{0}.

Technically, the only obstacle I see is that we'd need to treat an
existence entry especially for the purpose of compaction/gc: we can
discard older entries, but we shouldn't discard the existence bit, no
matter how old it is.

Ævar Arnfjörð Bjarmason Feb. 3, 2022, 5:39 p.m. UTC | #8

On Thu, Feb 03 2022, Han-Wen Nienhuys wrote:

> On Tue, Feb 1, 2022 at 11:12 PM Junio C Hamano <gitster@pobox.com> wrote:
>>
>> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>>
>> > We could surely add magic record types, but how would such a dance be
>> > performed while keeping compatibility with existing JGit clients?
>>
>> Yes.  It is exactly the point of the question I asked.  If it is
>> simple and easy to add such a new type that is ignored/skipped by
>> existing clients, then we can go that route.  If it is simple and
>> easy to add a new bit per ref that existing clients would not barf,
>> we can use that as an alternative implementation strategy.
>
> I'm not sure that there are any JGit clients: I committed reftable
> support at the end of 2019. Before that time, we were running it
> internally at Google, but only ref storage, and without the posix
> part. Reflogs were never stored in refable, and I actually found a
> couple of bugs in Shawn's Java code.
>
> Gerrit has increasingly started using Git as a database, and the
> packed/loose system is just not a very good database, so that
> motivates the work reftable in general. But the folks who run Gerrit
> on a POSIX filesystem want to be sure that isn't a fringe feature, so
> they only want to start using it once Git itself supports it. So there
> is a chicken & egg problem.
>
> It's sad that we have to introduce an existence bit to make things
> work, but overall it is probably easier for me to do than trying to
> make sense of sequencer.c and how it uses refs/stash@{0}.
>
> Technically, the only obstacle I see is that we'd need to treat an
> existence entry especially for the purpose of compaction/gc: we can
> discard older entries, but we shouldn't discard the existence bit, no
> matter how old it is.

Ah, that's very informative. I had been assuming (or misremembered) that
reftable was already seeing production use at Google. Perhaps I
remembering the now-dead Google Code (or whatever it was called). Maybe
not.

In any case, not being locked into the format as specified is very
nice. So is it basically seeing no (production) use anywhere as far as
you know? Whether that's in production at Google, or some third parties
via JGit-something (maybe as editor libraries?).

Taking a bit of a step back.

I do think that generally speaking parts of this series are putting the
cart before the horse in seemingly trying to get the test suite clean
before we have the integration in-tree.

Not everything you have here, but some of it.

I know I'm the one who started encouraging you to work towards getting
the test mode passing, but I think that while it's good to mark some
obviously file-only tests beforehand, anything where we have different
behavior under reftable should really come after.

Because then we can positively assert what we do differently, not just
skip an existing test.

And yes, for many tests that will require rewriting their setup, because
they conflate things that are backend-independent, such as the general
question of "can we ask about reflog existence?" with the implementation
detail of the test setup, which oftentimes is file-backend specific.

Of course that will mean we'll have some interim period where our test
suite is a dumpster fire under GIT_TEST_REFTABLE=true, and I think
that's fine, as long as we work towards getting it passing, and as long
as the non-stability of the nascent backend is very prominently
advertised in the interim.

I.e. I think *the* issue with the original series you had in this regard
was that git-init.txt (or whatever it was) basically just discussed
enabling reftable matter-of-factly, when we were still failing
tens/hundreds of tests, which is just setting up a big bear trap for
users to step into.

But if we just changed those docs a bit to note "!!WARNING WARNING!!
EXPERIMENTAL AND UNSTABLE !!WARNING WARNING!!" or whatever we could
merge the API integration parts sooner than later, even with a lot of
known-broken tests.

We could then whitelist the broken parts, and work on narrowing that set
down. Similar what the SANITIZE=leak mode is currently doing for memory
leaks.

I think that would make things a lot easier when reviewing submissions
like these, in that we have reftable/* in-tree already, but with the
"real" integration we could check how files/reftable backends behave,
add the diverging behavior to tests etc.

What do you think?

Han-Wen Nienhuys Feb. 3, 2022, 6:10 p.m. UTC | #9

On Thu, Feb 3, 2022 at 6:53 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
> >> Yes.  It is exactly the point of the question I asked.  If it is
> >> simple and easy to add such a new type that is ignored/skipped by
> >> existing clients, then we can go that route.  If it is simple and
> >> easy to add a new bit per ref that existing clients would not barf,
> >> we can use that as an alternative implementation strategy.
> >
> > I'm not sure that there are any JGit clients: I committed reftable
> > support at the end of 2019. Before that time, we were running it
> > internally at Google, but only ref storage, and without the posix
> > part. Reflogs were never stored in refable, and I actually found a
> > couple of bugs in Shawn's Java code.
> >
> > Gerrit has increasingly started using Git as a database, and the
> > packed/loose system is just not a very good database, so that
> > motivates the work reftable in general. But the folks who run Gerrit
> > on a POSIX filesystem want to be sure that isn't a fringe feature, so
> > they only want to start using it once Git itself supports it. So there
> > is a chicken & egg problem.
> >
> > It's sad that we have to introduce an existence bit to make things
> > work, but overall it is probably easier for me to do than trying to
> > make sense of sequencer.c and how it uses refs/stash@{0}.
> >
> > Technically, the only obstacle I see is that we'd need to treat an
> > existence entry especially for the purpose of compaction/gc: we can
> > discard older entries, but we shouldn't discard the existence bit, no
> > matter how old it is.
>
> Ah, that's very informative. I had been assuming (or misremembered) that
> reftable was already seeing production use at Google. Perhaps I
> remembering the now-dead Google Code (or whatever it was called). Maybe
> not.

We use the format (the JGit code) at Google, but we only use it to
store refs, that is, the refname => {SHA1, symref, tag} mapping. We
currently don't store reflog data in reftable, and the bugs I found
were just in the reflog parts.

We store the tables in bigtable (among others), so the part that does
the POSIX file locking is new (basically, everything in
reftable/stack.c and its equivalent in JGit).

So we are locked into the format to some degree.

For the existence bit, I think I could simply record a $zeroid =>
$zeroid ref update in ref log and treat that specially.

> In any case, not being locked into the format as specified is very
> nice. So is it basically seeing no (production) use anywhere as far as
> you know? Whether that's in production at Google, or some third parties
> via JGit-something (maybe as editor libraries?).

I think our friends at Gerritforge have been experimenting with it,
but not in a production setting, AFAIK. Luca might confirm.

> Taking a bit of a step back.
>
> I do think that generally speaking parts of this series are putting the
> cart before the horse in seemingly trying to get the test suite clean
> before we have the integration in-tree.
>
> Not everything you have here, but some of it.
>
> I know I'm the one who started encouraging you to work towards getting
> the test mode passing, but I think that while it's good to mark some
> obviously file-only tests beforehand, anything where we have different
> behavior under reftable should really come after.

(I can't parse your last sentence)

> Of course that will mean we'll have some interim period where our test
> suite is a dumpster fire under GIT_TEST_REFTABLE=true, and I think

It's actually not that bad. By my last count, there were 38 files with
test failures.

> that's fine, as long as we work towards getting it passing, and as long
> as the non-stability of the nascent backend is very prominently
> advertised in the interim.
>
> I.e. I think *the* issue with the original series you had in this regard
> was that git-init.txt (or whatever it was) basically just discussed
> enabling reftable matter-of-factly, when we were still failing
> tens/hundreds of tests, which is just setting up a big bear trap for
> users to step into.

I read that comment, and I removed that long ago. Right now the only
way to get a reftable is to say GIT_TEST_REFTABLE=1 on init.

> But if we just changed those docs a bit to note "!!WARNING WARNING!!
> EXPERIMENTAL AND UNSTABLE !!WARNING WARNING!!" or whatever we could
> merge the API integration parts sooner than later, even with a lot of
> known-broken tests.
>
> We could then whitelist the broken parts, and work on narrowing that set
> down. Similar what the SANITIZE=leak mode is currently doing for memory
> leaks.
>
> I think that would make things a lot easier when reviewing submissions
> like these, in that we have reftable/* in-tree already, but with the
> "real" integration we could check how files/reftable backends behave,
> add the diverging behavior to tests etc.
>
> What do you think?

Sounds more fun than the current model :-)

The latest version of the code is here:
https://github.com/hanwen/git/tree/merged-seen-20220117

What do you need besides the "RFC: reftable backend" commit?

Junio C Hamano Feb. 3, 2022, 11:06 p.m. UTC | #10

Han-Wen Nienhuys <hanwen@google.com> writes:

> Technically, the only obstacle I see is that we'd need to treat an
> existence entry especially for the purpose of compaction/gc: we can
> discard older entries, but we shouldn't discard the existence bit, no
> matter how old it is.

I was hoping that we already have a type of block that can be used
to record an attribute on the ref (other than its value) and it
would be just the matter of stealing one unused bit from such a
record per ref to say "when answering 'does this ref have reflog?'
say yes even when there is no log record for that refname".  Or the
table format is extensible enough that we can add such a block
without breaking existing clients.

If we have to implement the "this ref has (enabled) reflog, even
though there may not be any log record for it right now" bit as a
fake log record, then yes, we'd have to teach the log iterator to
skip such an entry and we'd have to teach the expiry logic that they
are not to be expired.  It certainly is a sad design than being able
to express the bit in a more direct way (like file backend does,
which is "presence of the reflog file gives the precense of the log,
contents of the reflog file are the actual logs).

Han-Wen Nienhuys Feb. 7, 2022, 9:48 a.m. UTC | #11

On Fri, Feb 4, 2022 at 12:06 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Han-Wen Nienhuys <hanwen@google.com> writes:
>
> > Technically, the only obstacle I see is that we'd need to treat an
> > existence entry especially for the purpose of compaction/gc: we can
> > discard older entries, but we shouldn't discard the existence bit, no
> > matter how old it is.
>
> I was hoping that we already have a type of block that can be used
> to record an attribute on the ref (other than its value) and it
> would be just the matter of stealing one unused bit from such a
> record per ref to say "when answering 'does this ref have reflog?'
> say yes even when there is no log record for that refname".  Or the
> table format is extensible enough that we can add such a block
> without breaking existing clients.

That place doesn't exist, unfortunately, but even if it did, having a
special reflog entry indicating existence is a better solution all
around, I think. A separate per-ref bit allows for data
inconsistencies: what if the bit says "there is no reflog", but we
actually do have reflog entries in the 'g' section?

It also has less chances of creating complicated control flows
(especially in JGit which wasn't designed for this bit from the
start): the tables have to be written in lexicographic order, so you
only can write this bit after you know if reflog entries were written
for a certain ref.

Han-Wen Nienhuys Feb. 7, 2022, 4:52 p.m. UTC | #12

On Mon, Feb 7, 2022 at 10:48 AM Han-Wen Nienhuys <hanwen@google.com> wrote:
>
> On Fri, Feb 4, 2022 at 12:06 AM Junio C Hamano <gitster@pobox.com> wrote:
> >
> > Han-Wen Nienhuys <hanwen@google.com> writes:
> >
> > > Technically, the only obstacle I see is that we'd need to treat an
> > > existence entry especially for the purpose of compaction/gc: we can
> > > discard older entries, but we shouldn't discard the existence bit, no
> > > matter how old it is.
> >
> > I was hoping that we already have a type of block that can be used
> > to record an attribute on the ref (other than its value) and it
> > would be just the matter of stealing one unused bit from such a
> > record per ref to say "when answering 'does this ref have reflog?'
> > say yes even when there is no log record for that refname".  Or the
> > table format is extensible enough that we can add such a block
> > without breaking existing clients.
>
> That place doesn't exist, unfortunately, but even if it did, having a
> special reflog entry indicating existence is a better solution all
> around, I think. A separate per-ref bit allows for data
> inconsistencies: what if the bit says "there is no reflog", but we
> actually do have reflog entries in the 'g' section?
>
> It also has less chances of creating complicated control flows
> (especially in JGit which wasn't designed for this bit from the
> start): the tables have to be written in lexicographic order, so you
> only can write this bit after you know if reflog entries were written
> for a certain ref.

Correction. I wish the table blocks were written in lexicographic
order, but they are written in order 'g', ['i',] 'o', ['i'], 'g',
['i']. Since the 'g' block is last within a table, we could add a new
section at the end.  My point that this is considerable work to think
through how to make this work with JGit still stands, though.

Junio C Hamano Feb. 7, 2022, 11:40 p.m. UTC | #13

Han-Wen Nienhuys <hanwen@google.com> writes:

>> It also has less chances of creating complicated control flows
>> (especially in JGit which wasn't designed for this bit from the
>> start): the tables have to be written in lexicographic order, so you
>> only can write this bit after you know if reflog entries were written
>> for a certain ref.
>
> Correction. I wish the table blocks were written in lexicographic
> order, but they are written in order 'g', ['i',] 'o', ['i'], 'g',
> ['i']. Since the 'g' block is last within a table, we could add a new
> section at the end.  My point that this is considerable work to think
> through how to make this work with JGit still stands, though.

As long as a fake/NULL entry in the reflog is invisible to iterators
and does not count as part of numbered entries when reflog@{23}
notation is used, I think it is perfectly fine to take that
approach, instead of "separate bit".  I brought it up only as a
possible alternative (i.e. "if bit is on or any entry exists, we do
have log for the ref") in case ignoring the fake entry is impossible.

Han-Wen Nienhuys Feb. 8, 2022, 2:58 p.m. UTC | #14

On Tue, Feb 8, 2022 at 12:40 AM Junio C Hamano <gitster@pobox.com> wrote:
> >> It also has less chances of creating complicated control flows
> >> (especially in JGit which wasn't designed for this bit from the
> >> start): the tables have to be written in lexicographic order, so you
> >> only can write this bit after you know if reflog entries were written
> >> for a certain ref.
> >
> > Correction. I wish the table blocks were written in lexicographic
> > order, but they are written in order 'g', ['i',] 'o', ['i'], 'g',
> > ['i']. Since the 'g' block is last within a table, we could add a new
> > section at the end.  My point that this is considerable work to think
> > through how to make this work with JGit still stands, though.
>
> As long as a fake/NULL entry in the reflog is invisible to iterators
> and does not count as part of numbered entries when reflog@{23}
> notation is used, I think it is perfectly fine to take that
> approach, instead of "separate bit".  I brought it up only as a
> possible alternative (i.e. "if bit is on or any entry exists, we do
> have log for the ref") in case ignoring the fake entry is impossible.

I implemented this. It was very clean and easy; I didn't yet check if
JGit can handle it, but if it doesn't, it should be easy to fix.

You can drop patch 2/3 (ie. this subject line).

[2/3] t1405: mark test that checks existence as REFFILES

Commit Message

Comments

Patch