Message ID | 1ded69d89709d23147b29f67122b659293414405.1643651420.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 53af25e47c5f1261c7da47ca4e01ed9d9f3ffcab |
Headers | show |
Series | reftable related test tweaks | expand |
On Mon, Jan 31, 2022 at 05:50:19PM +0000, Han-Wen Nienhuys via GitGitGadget wrote: > From: Han-Wen Nienhuys <hanwen@google.com> > > The reftable backend doesn't support mere existence of reflogs. Perhaps I'm missing something obvious, but this and the previous patch seem to be conflicting each other. My understanding of the previous change is that you wanted a reflog entry when the REFFILES prerequisite isn't met. But this patch says what matches my understanding is that reftable and reflogs do not play together. If reflogs do not interact with the reftable backend, then what does this patch do? Thanks, Taylor
Taylor Blau <me@ttaylorr.com> writes: > On Mon, Jan 31, 2022 at 05:50:19PM +0000, Han-Wen Nienhuys via GitGitGadget wrote: >> From: Han-Wen Nienhuys <hanwen@google.com> >> >> The reftable backend doesn't support mere existence of reflogs. > > Perhaps I'm missing something obvious, but this and the previous patch > seem to be conflicting each other. > > My understanding of the previous change is that you wanted a reflog > entry when the REFFILES prerequisite isn't met. But this patch says what > matches my understanding is that reftable and reflogs do not play > together. > > If reflogs do not interact with the reftable backend, then what does > this patch do? One difference between the files and the reftable backend is that with the files backend, you can say "I am not adding any entry yet, but remember that reflog is enabled for this ref, while all other refs reflog is not enabled", and the way to do so is to touch the "$GIT_DIR/logs/refs/heads/frotz" file---this enables reflog for the "frotz" branch, even if core.logAllRefUpdates is not set. Because there is no generic reflog API that says "enable log for this ref", a test that checks this feature with files backend would do "touch .git/refs/heads/frotz". I didn't look at "this patch", but it is understandable if such a test needs to be skipped via REFFILES prerequisite, because the reftable backend lacks this feature.
On Mon, Jan 31, 2022 at 11:15 PM Junio C Hamano <gitster@pobox.com> wrote: > > Taylor Blau <me@ttaylorr.com> writes: > > > On Mon, Jan 31, 2022 at 05:50:19PM +0000, Han-Wen Nienhuys via GitGitGadget wrote: > >> From: Han-Wen Nienhuys <hanwen@google.com> > >> > >> The reftable backend doesn't support mere existence of reflogs. > > > > Perhaps I'm missing something obvious, but this and the previous patch > > seem to be conflicting each other. > > > > My understanding of the previous change is that you wanted a reflog > > entry when the REFFILES prerequisite isn't met. But this patch says what > > matches my understanding is that reftable and reflogs do not play > > together. > > > > If reflogs do not interact with the reftable backend, then what does > > this patch do? > > One difference between the files and the reftable backend is that > with the files backend, you can say "I am not adding any entry yet, > but remember that reflog is enabled for this ref, while all other > refs reflog is not enabled", and the way to do so is to touch the > "$GIT_DIR/logs/refs/heads/frotz" file---this enables reflog for the > "frotz" branch, even if core.logAllRefUpdates is not set. > > Because there is no generic reflog API that says "enable log for > this ref", a test that checks this feature with files backend would > do "touch .git/refs/heads/frotz". There is refs_create_reflog(), so the generic reflog API exists. The problem is that there is no sensible way to implement it in reftable. One option is (reflog exists == there exists at least one reflog entry for the ref). This messes up the test from this patch, because it creates a reflog, but because it doesn't populate the reflog, so we return false for git-reflog-exists. It also turns out to mess up the tests in t3420, as follows: ++ git stash show -p error: refs/stash@{0} is not a valid reference I get reflog_exists: refs/stash: 0 and "git stash show -p" aborts with "error: refs/stash@{0} is not a valid reference". So I now went with the other option, ie. (reflog exists == true), ie. every conceivable ref has a reflog (but most are empty). This makes t3420 pass. This behavior also confuses t1405, because in $RUN delete-reflog HEAD && test_must_fail git reflog exists HEAD the last command now always returns true.
Han-Wen Nienhuys <hanwen@google.com> writes: >> Because there is no generic reflog API that says "enable log for >> this ref", a test that checks this feature with files backend would >> do "touch .git/refs/heads/frotz". > > There is refs_create_reflog(), so the generic reflog API exists. The > problem is that there is no sensible way to implement it in reftable. Ah, yes, that's correct. > One option is (reflog exists == there exists at least one reflog entry > for the ref). Because the current callers of refs_create_reflog() does want a reflog created that does not give any entry when iterated, I agree with you that adding a "fake" reflog entry alone is not a sufficient emulation of the API. I think these are all ... > This messes up the test from this patch, because it > creates a reflog, but because it doesn't populate the reflog, so we > return false for git-reflog-exists. > > It also turns out to mess up the tests in t3420, as follows: > > ++ git stash show -p > error: refs/stash@{0} is not a valid reference > > I get > > reflog_exists: refs/stash: 0 > > and "git stash show -p" aborts with "error: refs/stash@{0} is not a > valid reference". ... indications of hat. I wonder if it is simple and easy to add a new reflog entry type used as an implementation detail of the reftable. If we can do so, then, the reftable backend integrated to the ref API can do these things: - reflog_exists() can say yes when one reflog entry of any type (internal to the reftable implementation) exists for the ref; - create_reflog() can add a reflog entry of the "fake" type (internal to the reftable implementation); - for_each_reflog_ent() and its reverse can learn to skip such a fake reflog entry. As there is no way to ask, via the API, the number of the existing reflog entries, the ref API callers would not be able to tell such an implementation detail that with reftable backend, create_reflog() does not create an empty reflog. To them, a reflog created with the API call would truly be empty as iterators will not return anything. Or do we have a list of refs kept somewhere in the reftable data structure in a separate chunk? Do we have a bit for each of these refs to record if the log is enabled for it? Then instead of the fake reflog entry, we could implement the necessary semantics a lot more cleanly: - reflog_exists() can just peek the bit. - create_reflog() can just flip the bit. - there is no need to touch the iterators. - the equivalent to files_log_ref_write() can decide based on the bit (i.e. what reflog_exists() says) whether to log changes to the ref. It is probably a lot more sensible to fail refs_create_reflog() and safe_create_reflog() (which is a thin wrapper around the former), if we cannot implement "a reflog can exist and have no entries yet" semantics. Outside the test helper, the only place the helper is used is "checkout -l" when should_autocreate_reflog() returns false, which should be rare (as we are updating a branch, it should be either a detached HEAD or a ref under refs/heads/), so it would not be a huge practical downside if we cannot prepare an empty reflog anyway, I would think. Thanks.
On Tue, Feb 01 2022, Junio C Hamano wrote: > Han-Wen Nienhuys <hanwen@google.com> writes: > >>> Because there is no generic reflog API that says "enable log for >>> this ref", a test that checks this feature with files backend would >>> do "touch .git/refs/heads/frotz". >> >> There is refs_create_reflog(), so the generic reflog API exists. The >> problem is that there is no sensible way to implement it in reftable. > > Ah, yes, that's correct. > >> One option is (reflog exists == there exists at least one reflog entry >> for the ref). > > Because the current callers of refs_create_reflog() does want a > reflog created that does not give any entry when iterated, I agree > with you that adding a "fake" reflog entry alone is not a sufficient > emulation of the API. I think these are all ... > >> This messes up the test from this patch, because it >> creates a reflog, but because it doesn't populate the reflog, so we >> return false for git-reflog-exists. >> >> It also turns out to mess up the tests in t3420, as follows: >> >> ++ git stash show -p >> error: refs/stash@{0} is not a valid reference >> >> I get >> >> reflog_exists: refs/stash: 0 >> >> and "git stash show -p" aborts with "error: refs/stash@{0} is not a >> valid reference". > > ... indications of hat. > > I wonder if it is simple and easy to add a new reflog entry type > used as an implementation detail of the reftable. If we can do so, > then, the reftable backend integrated to the ref API can do these > things: > > - reflog_exists() can say yes when one reflog entry of any type > (internal to the reftable implementation) exists for the ref; > > - create_reflog() can add a reflog entry of the "fake" type > (internal to the reftable implementation); > > - for_each_reflog_ent() and its reverse can learn to skip such a > fake reflog entry. > > As there is no way to ask, via the API, the number of the existing > reflog entries, the ref API callers would not be able to tell such > an implementation detail that with reftable backend, create_reflog() > does not create an empty reflog. To them, a reflog created with the > API call would truly be empty as iterators will not return anything. We could surely add magic record types, but how would such a dance be performed while keeping compatibility with existing JGit clients?
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > We could surely add magic record types, but how would such a dance be > performed while keeping compatibility with existing JGit clients? Yes. It is exactly the point of the question I asked. If it is simple and easy to add such a new type that is ignored/skipped by existing clients, then we can go that route. If it is simple and easy to add a new bit per ref that existing clients would not barf, we can use that as an alternative implementation strategy. And if neither is possible, and there is no other viable third way, then what I wrote in the part you omitted from your quote still stands, which was: >> It is probably a lot more sensible to fail refs_create_reflog() and >> safe_create_reflog() (which is a thin wrapper around the former), if >> we cannot implement "a reflog can exist and have no entries yet" >> semantics.
On Tue, Feb 1, 2022 at 11:12 PM Junio C Hamano <gitster@pobox.com> wrote: > > Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > > > We could surely add magic record types, but how would such a dance be > > performed while keeping compatibility with existing JGit clients? > > Yes. It is exactly the point of the question I asked. If it is > simple and easy to add such a new type that is ignored/skipped by > existing clients, then we can go that route. If it is simple and > easy to add a new bit per ref that existing clients would not barf, > we can use that as an alternative implementation strategy. I'm not sure that there are any JGit clients: I committed reftable support at the end of 2019. Before that time, we were running it internally at Google, but only ref storage, and without the posix part. Reflogs were never stored in refable, and I actually found a couple of bugs in Shawn's Java code. Gerrit has increasingly started using Git as a database, and the packed/loose system is just not a very good database, so that motivates the work reftable in general. But the folks who run Gerrit on a POSIX filesystem want to be sure that isn't a fringe feature, so they only want to start using it once Git itself supports it. So there is a chicken & egg problem. It's sad that we have to introduce an existence bit to make things work, but overall it is probably easier for me to do than trying to make sense of sequencer.c and how it uses refs/stash@{0}. Technically, the only obstacle I see is that we'd need to treat an existence entry especially for the purpose of compaction/gc: we can discard older entries, but we shouldn't discard the existence bit, no matter how old it is.
On Thu, Feb 03 2022, Han-Wen Nienhuys wrote: > On Tue, Feb 1, 2022 at 11:12 PM Junio C Hamano <gitster@pobox.com> wrote: >> >> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: >> >> > We could surely add magic record types, but how would such a dance be >> > performed while keeping compatibility with existing JGit clients? >> >> Yes. It is exactly the point of the question I asked. If it is >> simple and easy to add such a new type that is ignored/skipped by >> existing clients, then we can go that route. If it is simple and >> easy to add a new bit per ref that existing clients would not barf, >> we can use that as an alternative implementation strategy. > > I'm not sure that there are any JGit clients: I committed reftable > support at the end of 2019. Before that time, we were running it > internally at Google, but only ref storage, and without the posix > part. Reflogs were never stored in refable, and I actually found a > couple of bugs in Shawn's Java code. > > Gerrit has increasingly started using Git as a database, and the > packed/loose system is just not a very good database, so that > motivates the work reftable in general. But the folks who run Gerrit > on a POSIX filesystem want to be sure that isn't a fringe feature, so > they only want to start using it once Git itself supports it. So there > is a chicken & egg problem. > > It's sad that we have to introduce an existence bit to make things > work, but overall it is probably easier for me to do than trying to > make sense of sequencer.c and how it uses refs/stash@{0}. > > Technically, the only obstacle I see is that we'd need to treat an > existence entry especially for the purpose of compaction/gc: we can > discard older entries, but we shouldn't discard the existence bit, no > matter how old it is. Ah, that's very informative. I had been assuming (or misremembered) that reftable was already seeing production use at Google. Perhaps I remembering the now-dead Google Code (or whatever it was called). Maybe not. In any case, not being locked into the format as specified is very nice. So is it basically seeing no (production) use anywhere as far as you know? Whether that's in production at Google, or some third parties via JGit-something (maybe as editor libraries?). Taking a bit of a step back. I do think that generally speaking parts of this series are putting the cart before the horse in seemingly trying to get the test suite clean before we have the integration in-tree. Not everything you have here, but some of it. I know I'm the one who started encouraging you to work towards getting the test mode passing, but I think that while it's good to mark some obviously file-only tests beforehand, anything where we have different behavior under reftable should really come after. Because then we can positively assert what we do differently, not just skip an existing test. And yes, for many tests that will require rewriting their setup, because they conflate things that are backend-independent, such as the general question of "can we ask about reflog existence?" with the implementation detail of the test setup, which oftentimes is file-backend specific. Of course that will mean we'll have some interim period where our test suite is a dumpster fire under GIT_TEST_REFTABLE=true, and I think that's fine, as long as we work towards getting it passing, and as long as the non-stability of the nascent backend is very prominently advertised in the interim. I.e. I think *the* issue with the original series you had in this regard was that git-init.txt (or whatever it was) basically just discussed enabling reftable matter-of-factly, when we were still failing tens/hundreds of tests, which is just setting up a big bear trap for users to step into. But if we just changed those docs a bit to note "!!WARNING WARNING!! EXPERIMENTAL AND UNSTABLE !!WARNING WARNING!!" or whatever we could merge the API integration parts sooner than later, even with a lot of known-broken tests. We could then whitelist the broken parts, and work on narrowing that set down. Similar what the SANITIZE=leak mode is currently doing for memory leaks. I think that would make things a lot easier when reviewing submissions like these, in that we have reftable/* in-tree already, but with the "real" integration we could check how files/reftable backends behave, add the diverging behavior to tests etc. What do you think?
On Thu, Feb 3, 2022 at 6:53 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > >> Yes. It is exactly the point of the question I asked. If it is > >> simple and easy to add such a new type that is ignored/skipped by > >> existing clients, then we can go that route. If it is simple and > >> easy to add a new bit per ref that existing clients would not barf, > >> we can use that as an alternative implementation strategy. > > > > I'm not sure that there are any JGit clients: I committed reftable > > support at the end of 2019. Before that time, we were running it > > internally at Google, but only ref storage, and without the posix > > part. Reflogs were never stored in refable, and I actually found a > > couple of bugs in Shawn's Java code. > > > > Gerrit has increasingly started using Git as a database, and the > > packed/loose system is just not a very good database, so that > > motivates the work reftable in general. But the folks who run Gerrit > > on a POSIX filesystem want to be sure that isn't a fringe feature, so > > they only want to start using it once Git itself supports it. So there > > is a chicken & egg problem. > > > > It's sad that we have to introduce an existence bit to make things > > work, but overall it is probably easier for me to do than trying to > > make sense of sequencer.c and how it uses refs/stash@{0}. > > > > Technically, the only obstacle I see is that we'd need to treat an > > existence entry especially for the purpose of compaction/gc: we can > > discard older entries, but we shouldn't discard the existence bit, no > > matter how old it is. > > Ah, that's very informative. I had been assuming (or misremembered) that > reftable was already seeing production use at Google. Perhaps I > remembering the now-dead Google Code (or whatever it was called). Maybe > not. We use the format (the JGit code) at Google, but we only use it to store refs, that is, the refname => {SHA1, symref, tag} mapping. We currently don't store reflog data in reftable, and the bugs I found were just in the reflog parts. We store the tables in bigtable (among others), so the part that does the POSIX file locking is new (basically, everything in reftable/stack.c and its equivalent in JGit). So we are locked into the format to some degree. For the existence bit, I think I could simply record a $zeroid => $zeroid ref update in ref log and treat that specially. > In any case, not being locked into the format as specified is very > nice. So is it basically seeing no (production) use anywhere as far as > you know? Whether that's in production at Google, or some third parties > via JGit-something (maybe as editor libraries?). I think our friends at Gerritforge have been experimenting with it, but not in a production setting, AFAIK. Luca might confirm. > Taking a bit of a step back. > > I do think that generally speaking parts of this series are putting the > cart before the horse in seemingly trying to get the test suite clean > before we have the integration in-tree. > > Not everything you have here, but some of it. > > I know I'm the one who started encouraging you to work towards getting > the test mode passing, but I think that while it's good to mark some > obviously file-only tests beforehand, anything where we have different > behavior under reftable should really come after. (I can't parse your last sentence) > Of course that will mean we'll have some interim period where our test > suite is a dumpster fire under GIT_TEST_REFTABLE=true, and I think It's actually not that bad. By my last count, there were 38 files with test failures. > that's fine, as long as we work towards getting it passing, and as long > as the non-stability of the nascent backend is very prominently > advertised in the interim. > > I.e. I think *the* issue with the original series you had in this regard > was that git-init.txt (or whatever it was) basically just discussed > enabling reftable matter-of-factly, when we were still failing > tens/hundreds of tests, which is just setting up a big bear trap for > users to step into. I read that comment, and I removed that long ago. Right now the only way to get a reftable is to say GIT_TEST_REFTABLE=1 on init. > But if we just changed those docs a bit to note "!!WARNING WARNING!! > EXPERIMENTAL AND UNSTABLE !!WARNING WARNING!!" or whatever we could > merge the API integration parts sooner than later, even with a lot of > known-broken tests. > > We could then whitelist the broken parts, and work on narrowing that set > down. Similar what the SANITIZE=leak mode is currently doing for memory > leaks. > > I think that would make things a lot easier when reviewing submissions > like these, in that we have reftable/* in-tree already, but with the > "real" integration we could check how files/reftable backends behave, > add the diverging behavior to tests etc. > > What do you think? Sounds more fun than the current model :-) The latest version of the code is here: https://github.com/hanwen/git/tree/merged-seen-20220117 What do you need besides the "RFC: reftable backend" commit?
Han-Wen Nienhuys <hanwen@google.com> writes: > Technically, the only obstacle I see is that we'd need to treat an > existence entry especially for the purpose of compaction/gc: we can > discard older entries, but we shouldn't discard the existence bit, no > matter how old it is. I was hoping that we already have a type of block that can be used to record an attribute on the ref (other than its value) and it would be just the matter of stealing one unused bit from such a record per ref to say "when answering 'does this ref have reflog?' say yes even when there is no log record for that refname". Or the table format is extensible enough that we can add such a block without breaking existing clients. If we have to implement the "this ref has (enabled) reflog, even though there may not be any log record for it right now" bit as a fake log record, then yes, we'd have to teach the log iterator to skip such an entry and we'd have to teach the expiry logic that they are not to be expired. It certainly is a sad design than being able to express the bit in a more direct way (like file backend does, which is "presence of the reflog file gives the precense of the log, contents of the reflog file are the actual logs).
On Fri, Feb 4, 2022 at 12:06 AM Junio C Hamano <gitster@pobox.com> wrote: > > Han-Wen Nienhuys <hanwen@google.com> writes: > > > Technically, the only obstacle I see is that we'd need to treat an > > existence entry especially for the purpose of compaction/gc: we can > > discard older entries, but we shouldn't discard the existence bit, no > > matter how old it is. > > I was hoping that we already have a type of block that can be used > to record an attribute on the ref (other than its value) and it > would be just the matter of stealing one unused bit from such a > record per ref to say "when answering 'does this ref have reflog?' > say yes even when there is no log record for that refname". Or the > table format is extensible enough that we can add such a block > without breaking existing clients. That place doesn't exist, unfortunately, but even if it did, having a special reflog entry indicating existence is a better solution all around, I think. A separate per-ref bit allows for data inconsistencies: what if the bit says "there is no reflog", but we actually do have reflog entries in the 'g' section? It also has less chances of creating complicated control flows (especially in JGit which wasn't designed for this bit from the start): the tables have to be written in lexicographic order, so you only can write this bit after you know if reflog entries were written for a certain ref.
On Mon, Feb 7, 2022 at 10:48 AM Han-Wen Nienhuys <hanwen@google.com> wrote: > > On Fri, Feb 4, 2022 at 12:06 AM Junio C Hamano <gitster@pobox.com> wrote: > > > > Han-Wen Nienhuys <hanwen@google.com> writes: > > > > > Technically, the only obstacle I see is that we'd need to treat an > > > existence entry especially for the purpose of compaction/gc: we can > > > discard older entries, but we shouldn't discard the existence bit, no > > > matter how old it is. > > > > I was hoping that we already have a type of block that can be used > > to record an attribute on the ref (other than its value) and it > > would be just the matter of stealing one unused bit from such a > > record per ref to say "when answering 'does this ref have reflog?' > > say yes even when there is no log record for that refname". Or the > > table format is extensible enough that we can add such a block > > without breaking existing clients. > > That place doesn't exist, unfortunately, but even if it did, having a > special reflog entry indicating existence is a better solution all > around, I think. A separate per-ref bit allows for data > inconsistencies: what if the bit says "there is no reflog", but we > actually do have reflog entries in the 'g' section? > > It also has less chances of creating complicated control flows > (especially in JGit which wasn't designed for this bit from the > start): the tables have to be written in lexicographic order, so you > only can write this bit after you know if reflog entries were written > for a certain ref. Correction. I wish the table blocks were written in lexicographic order, but they are written in order 'g', ['i',] 'o', ['i'], 'g', ['i']. Since the 'g' block is last within a table, we could add a new section at the end. My point that this is considerable work to think through how to make this work with JGit still stands, though.
Han-Wen Nienhuys <hanwen@google.com> writes: >> It also has less chances of creating complicated control flows >> (especially in JGit which wasn't designed for this bit from the >> start): the tables have to be written in lexicographic order, so you >> only can write this bit after you know if reflog entries were written >> for a certain ref. > > Correction. I wish the table blocks were written in lexicographic > order, but they are written in order 'g', ['i',] 'o', ['i'], 'g', > ['i']. Since the 'g' block is last within a table, we could add a new > section at the end. My point that this is considerable work to think > through how to make this work with JGit still stands, though. As long as a fake/NULL entry in the reflog is invisible to iterators and does not count as part of numbered entries when reflog@{23} notation is used, I think it is perfectly fine to take that approach, instead of "separate bit". I brought it up only as a possible alternative (i.e. "if bit is on or any entry exists, we do have log for the ref") in case ignoring the fake entry is impossible.
On Tue, Feb 8, 2022 at 12:40 AM Junio C Hamano <gitster@pobox.com> wrote: > >> It also has less chances of creating complicated control flows > >> (especially in JGit which wasn't designed for this bit from the > >> start): the tables have to be written in lexicographic order, so you > >> only can write this bit after you know if reflog entries were written > >> for a certain ref. > > > > Correction. I wish the table blocks were written in lexicographic > > order, but they are written in order 'g', ['i',] 'o', ['i'], 'g', > > ['i']. Since the 'g' block is last within a table, we could add a new > > section at the end. My point that this is considerable work to think > > through how to make this work with JGit still stands, though. > > As long as a fake/NULL entry in the reflog is invisible to iterators > and does not count as part of numbered entries when reflog@{23} > notation is used, I think it is perfectly fine to take that > approach, instead of "separate bit". I brought it up only as a > possible alternative (i.e. "if bit is on or any entry exists, we do > have log for the ref") in case ignoring the fake entry is impossible. I implemented this. It was very clean and easy; I didn't yet check if JGit can handle it, but if it doesn't, it should be easy to fix. You can drop patch 2/3 (ie. this subject line).
diff --git a/t/t1405-main-ref-store.sh b/t/t1405-main-ref-store.sh index 62e5e9d1b0a..51f82916281 100755 --- a/t/t1405-main-ref-store.sh +++ b/t/t1405-main-ref-store.sh @@ -111,7 +111,7 @@ test_expect_success 'delete_reflog(HEAD)' ' test_must_fail git reflog exists HEAD ' -test_expect_success 'create-reflog(HEAD)' ' +test_expect_success REFFILES 'create-reflog(HEAD)' ' $RUN create-reflog HEAD && git reflog exists HEAD '