Message ID | 259de62b26302c10f429d52bec42a8a954bc5ba3.1638886972.git.ps@pks.im (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC] fetch: update refs in a single transaction | expand |
On Tue, Dec 7, 2021 at 11:11 PM Patrick Steinhardt <ps@pks.im> wrote: > > When git-fetch(1) isn't called with the `--atomic` flag, then each > reference will be updated in a separate transaction. As a result, even > if we failed to update a subset of transactions, the remaining refs will > be modified after the command has finished. While this pattern is > reasonably efficient with the files backend where we'd lock and write > each file separately anyway, the upcoming reftable backend could handle > such an update a lot more efficiently if it was batched into a single > transaction given that it could then create a single new reftable slice > instead of creating one slice per updated ref. While this inefficiency > can be easily mitigated by using the `--atomic` flag, this flag cannot > be used in contexts where we want partial-update semantics. So it seems to me that there are 3 possible behaviors/modes: - "atomic": a single transaction with all or nothing semantics, currently implemented with --atomic - "partial-update": a single transaction with partial-update semantics, the new behavior this patch is implementing - "non-atomic": many transactions (one per ref?), currently the default when --atomic isn't passed Yeah, "partial-update" seems better than "non-atomic" when the reftable backend is used. But what happens when "partial-update" is used with the files backend? > Convert git-fetch(1) to always use a single reference transaction, > regardless of whether it is called with `--atomic` or not. The only > remaining difference between both modes is that with `--atomic` set, > we'd abort the transaciton in case at least one reference cannot be > updated. Nit: I would say "as soon as one reference cannot be updated" > Note that this slightly changes semantics of git-fetch(1): if we hit any > unexpected errors like the reference update racing with another process, > then we'll now fail to update any references, too. So that's one difference between the "partial-update" and the "non-atomic" modes, but what you say doesn't really make me confident that it's the only one. Also are there people who are in cases where a lot of reference updates are racing, and where it's important that in such cases at least some ref updates succeed? If yes, then maybe the 3 modes could be kept and options for "partial-update" and "non-atomic" could be added. "partial-update" could be the default in case the reftable backend is used, while "non-atomic" would still be the default when the files backend is used. > Signed-off-by: Patrick Steinhardt <ps@pks.im> > --- > builtin/fetch.c | 78 ++++++++++++++++--------------------------------- > 1 file changed, 25 insertions(+), 53 deletions(-) > > Hi, > > until now, we have been quite lenient with creating lots of reference > transactions even though they could've been bundled together into a > single transaction. After all, it didn't have much of a downside in most > contexts with the files backend: we'd have to lock each loose ref > separately anyway. I'd like to get some feedback on changing our stance > here, due to multiple reasons: > > - The upcoming reftable backend will be more efficient if we use a > single transaction to bundle together multiple ref updates given > that it only needs to write one new slice instead of one per > update. > > - Syncing refs to disk can be batched more efficiently if we bundle > ref updates. See my initial patch series to implement fsync-ing > refs [1] and Peff's benchmarks [2] demonstrating that fetches may > become a lot slower. Maybe it's explained in the pointers, but is this in the case of the files backend or the reftable backend or both? > - The reference-transaction hook can be used more efficiently given > that it would only need to execute twice, instead of twice per > updated ref. Yeah, so that could be a regression for people who expect it to run twice per updated ref. > It also has a more global view of what's happening. > While this is a concern of mine, it's a non-reason in the context > of the Git project given that we really ought not to change > semantics only to appease this hook. Not sure I understand what you are saying here. > With these reasons in mind, I'm wondering whether we want to accept > refactorings which convert existing code to use batched reference > transactions. While the potential performance improvements are a rather > obvious upside, the downside is that it would definitely change the > failure mode in many cases. Another downside is that it changes how and when reference-transaction hooks are called. > The example I have here with git-fetch(1) would mean that if we were to > race with any other process or if any other unexpected error occurs > which leads us to die, then we'll not commit any change to disk. This > can be seen as an improvement in consistency, but it can also be seen as > a change which breaks current semantics of trying to do as much work as > possible. I like the idea, but it seems a bit safer to me to have 3 different modes, so that people can test them in real life first for some time. Then we might later be more confident with changing the default for some backends. I might be worrying too much though, as very few people probably use reference-transaction hooks.
On Wed, Dec 08, 2021 at 09:15:01AM +0100, Christian Couder wrote: > On Tue, Dec 7, 2021 at 11:11 PM Patrick Steinhardt <ps@pks.im> wrote: > > > > When git-fetch(1) isn't called with the `--atomic` flag, then each > > reference will be updated in a separate transaction. As a result, even > > if we failed to update a subset of transactions, the remaining refs will > > be modified after the command has finished. While this pattern is > > reasonably efficient with the files backend where we'd lock and write > > each file separately anyway, the upcoming reftable backend could handle > > such an update a lot more efficiently if it was batched into a single > > transaction given that it could then create a single new reftable slice > > instead of creating one slice per updated ref. While this inefficiency > > can be easily mitigated by using the `--atomic` flag, this flag cannot > > be used in contexts where we want partial-update semantics. > > So it seems to me that there are 3 possible behaviors/modes: > > - "atomic": a single transaction with all or nothing semantics, > currently implemented with --atomic > - "partial-update": a single transaction with partial-update > semantics, the new behavior this patch is implementing > - "non-atomic": many transactions (one per ref?), currently the > default when --atomic isn't passed > > Yeah, "partial-update" seems better than "non-atomic" when the > reftable backend is used. But what happens when "partial-update" is > used with the files backend? Yeah, putting this in these three modes catches the gist of how it can work. Currently, we only support "atomic" and "non-atomic", and this patch proposes to replace "non-atomic" with "partial-update". I'd be happy to instead implement it as a third mode, but if we can find a way to "do the right thing" without having to introduce another option, then that would be best. But as per your comments below and my own concerns I'm not sure we can easily implement this without breaking existing usecases. > > Convert git-fetch(1) to always use a single reference transaction, > > regardless of whether it is called with `--atomic` or not. The only > > remaining difference between both modes is that with `--atomic` set, > > we'd abort the transaciton in case at least one reference cannot be > > updated. > > Nit: I would say "as soon as one reference cannot be updated" > > > Note that this slightly changes semantics of git-fetch(1): if we hit any > > unexpected errors like the reference update racing with another process, > > then we'll now fail to update any references, too. > > So that's one difference between the "partial-update" and the > "non-atomic" modes, but what you say doesn't really make me confident > that it's the only one. There can certainly be other edge cases which break, and I'm sure others will be able to pinpoint exactly which these are. > Also are there people who are in cases where a lot of reference > updates are racing, and where it's important that in such cases at > least some ref updates succeed? > > If yes, then maybe the 3 modes could be kept and options for > "partial-update" and "non-atomic" could be added. "partial-update" > could be the default in case the reftable backend is used, while > "non-atomic" would still be the default when the files backend is > used. I bet there are. > > Signed-off-by: Patrick Steinhardt <ps@pks.im> > > --- > > builtin/fetch.c | 78 ++++++++++++++++--------------------------------- > > 1 file changed, 25 insertions(+), 53 deletions(-) > > > > Hi, > > > > until now, we have been quite lenient with creating lots of reference > > transactions even though they could've been bundled together into a > > single transaction. After all, it didn't have much of a downside in most > > contexts with the files backend: we'd have to lock each loose ref > > separately anyway. I'd like to get some feedback on changing our stance > > here, due to multiple reasons: > > > > - The upcoming reftable backend will be more efficient if we use a > > single transaction to bundle together multiple ref updates given > > that it only needs to write one new slice instead of one per > > update. > > > > - Syncing refs to disk can be batched more efficiently if we bundle > > ref updates. See my initial patch series to implement fsync-ing > > refs [1] and Peff's benchmarks [2] demonstrating that fetches may > > become a lot slower. > > Maybe it's explained in the pointers, but is this in the case of the > files backend or the reftable backend or both? The discussion in [1] was in the case of the files backend: if we want to guarantee that we never write corrupt refs to disk, then we must sync written data to disk before moving refs into place. If using multiple transactions, then we thus need to fsync(3P) in each commit. If we're using a single transaction though, we can amortize the costs and batch this, where we only fsync(3P) once after all references have been prepared. The same is effectively true for the reftable backend though: if we want to ensure that reftable files are safe, then we'd have to sync them to disk, as well. > > - The reference-transaction hook can be used more efficiently given > > that it would only need to execute twice, instead of twice per > > updated ref. > > Yeah, so that could be a regression for people who expect it to run > twice per updated ref. The reference-transaction hook was always a mechanism which hooks into internals. As such, how often the hook executes and in which order was never guaranteed and is expected to change when implementation details change. So this is not something I'd ever call a regression: the whole intent of the hook is to be able to see what happens internally, not to assert semantics or the order in which updates are performed for a specific Git command. That's kind the trade-off with using a low-level hook as this one. > > It also has a more global view of what's happening. > > While this is a concern of mine, it's a non-reason in the context > > of the Git project given that we really ought not to change > > semantics only to appease this hook. > > Not sure I understand what you are saying here. Previously you saw `2*len(refs)` executions of the hook, so you'd have to piece together the puzzle to see what happens on a more global level yourself. With the change of using a single transaction, you'd get only two executions of the hook, where each execution has the complete picture of all refs which are about to be updated. > > With these reasons in mind, I'm wondering whether we want to accept > > refactorings which convert existing code to use batched reference > > transactions. While the potential performance improvements are a rather > > obvious upside, the downside is that it would definitely change the > > failure mode in many cases. > > Another downside is that it changes how and when reference-transaction > hooks are called. As mentioned above, this is nothing I'd call a downside. We cannot ever guarantee stable execution order of this hook, or otherwise we paint ourselves into a corner. > > The example I have here with git-fetch(1) would mean that if we were to > > race with any other process or if any other unexpected error occurs > > which leads us to die, then we'll not commit any change to disk. This > > can be seen as an improvement in consistency, but it can also be seen as > > a change which breaks current semantics of trying to do as much work as > > possible. > > I like the idea, but it seems a bit safer to me to have 3 different > modes, so that people can test them in real life first for some time. > Then we might later be more confident with changing the default for > some backends. I might be worrying too much though, as very few people > probably use reference-transaction hooks. Fair, I'll wait a bit for more feedback and then probably post a v2 with a third mode. I think changing behaviour depending on which ref backend would be bad precedent though. Ultimately, the user shouldn't notice which backend is in use, except that it would ideally be faster if using the new reftable backend. But slightly changing semantics of commands depending on which backend is active is only going to lead to confusion. Patrick
On Wed, Dec 08, 2021 at 09:15:01AM +0100, Christian Couder wrote: > > Note that this slightly changes semantics of git-fetch(1): if we hit any > > unexpected errors like the reference update racing with another process, > > then we'll now fail to update any references, too. > > So that's one difference between the "partial-update" and the > "non-atomic" modes, but what you say doesn't really make me confident > that it's the only one. > > Also are there people who are in cases where a lot of reference > updates are racing, and where it's important that in such cases at > least some ref updates succeed? > > If yes, then maybe the 3 modes could be kept and options for > "partial-update" and "non-atomic" could be added. "partial-update" > could be the default in case the reftable backend is used, while > "non-atomic" would still be the default when the files backend is > used. I think these three modes are hard to explain to users, because the failures which trigger for partial-update versus atomic depend on how things happen to be implemented. Just coming from a user's perspective, we might expect a breakdown like: - problems like non-fast-forward are logical policy issues, and we'd reject those without failing the whole transaction (in partial-update mode) - a system error like write() failing should be rare, and abandons the whole transaction (in either mode) But there are some confusing cases: - if somebody else takes the lock and updates the ref at the same time, then that is handled in the ref transaction code. And so even though it's closer to a logical policy issue, the patch here would fail the whole transaction - likewise name conflicts (writing "refs/foo" when "refs/foo/bar" exists or vice versa) are more of a logical issue, but are also handled by the transaction code. So I think we really have to teach the ref transaction code the notion of non-atomic transactions, or we'll end up leaking out all of those implementation details in user-facing behavior. I.e., the ref code needs to learn not to immediately abort if it fails one lockfile, but to optionally keep going (if the caller specified a non-atomic flag, of course). For the files-backend code, I think system errors would naturally fall out in the same code. Failing to write() or rename() is not much different than failing to get the lock in the first place. So "partial-update" and "non-atomic" behavior would end up the same anyway, and we do not need to expose the third mode to the user. For the reftable backend, syscalls are inherently covering all the refs anyway (we either commit the whole reftable update or not). So likewise there would be no different between partial-update and non-atomic modes (but they would both be different from the files backend). I suspect the surgery needed for the ref-transaction code to allow non-atomic updates would be pretty big, though. It involves checking every error case to make sure it is safe to continue rather than aborting (and munging data structures to mark particular refs as "failed, don't do anything further for this one"). So I dunno. All of my analysis assumes the breakdown of user expectations I gave above is a reasonable one. There may be others. But it seems like the behavior created by just this patch would be very hard to explain, and subject to change based on implementation details. -Peff
On Wed, Dec 08, 2021 at 04:13:48PM -0500, Jeff King wrote: > On Wed, Dec 08, 2021 at 09:15:01AM +0100, Christian Couder wrote: > > > > Note that this slightly changes semantics of git-fetch(1): if we hit any > > > unexpected errors like the reference update racing with another process, > > > then we'll now fail to update any references, too. > > > > So that's one difference between the "partial-update" and the > > "non-atomic" modes, but what you say doesn't really make me confident > > that it's the only one. > > > > Also are there people who are in cases where a lot of reference > > updates are racing, and where it's important that in such cases at > > least some ref updates succeed? > > > > If yes, then maybe the 3 modes could be kept and options for > > "partial-update" and "non-atomic" could be added. "partial-update" > > could be the default in case the reftable backend is used, while > > "non-atomic" would still be the default when the files backend is > > used. > > I think these three modes are hard to explain to users, because the > failures which trigger for partial-update versus atomic depend on how > things happen to be implemented. Just coming from a user's perspective, > we might expect a breakdown like: > > - problems like non-fast-forward are logical policy issues, and we'd > reject those without failing the whole transaction (in > partial-update mode) > > - a system error like write() failing should be rare, and abandons the > whole transaction (in either mode) > > But there are some confusing cases: > > - if somebody else takes the lock and updates the ref at the same > time, then that is handled in the ref transaction code. And so even > though it's closer to a logical policy issue, the patch here would > fail the whole transaction > > - likewise name conflicts (writing "refs/foo" when "refs/foo/bar" > exists or vice versa) are more of a logical issue, but are also > handled by the transaction code. > > So I think we really have to teach the ref transaction code the notion > of non-atomic transactions, or we'll end up leaking out all of those > implementation details in user-facing behavior. I.e., the ref code needs > to learn not to immediately abort if it fails one lockfile, but to > optionally keep going (if the caller specified a non-atomic flag, of > course). That would probably be the most flexible approach for the future, too. There's going to be several locations which could benefit from such a change, where we do want to get the performance benefits of using a single transaction while continuing to exhibit the same behaviour in edge cases where only a subset of refs could be updated. For what it's worth, I think that such a new mode should likely only kick in when "preparing" a transaction: this is the last step before "committing" it and would thus be the only place where the caller has a chance to introspect what really has been queued up and what had to be dropped from the transaction due to races or whatnot. As long as callers get a list of all dropped refs, including why they have been dropped, they can also still communicate this information to the user. > For the files-backend code, I think system errors would naturally fall > out in the same code. Failing to write() or rename() is not much > different than failing to get the lock in the first place. So > "partial-update" and "non-atomic" behavior would end up the same anyway, > and we do not need to expose the third mode to the user. I think I disagree here. Failing to write() to me is quite different from failing to take a lock: the first one is an unexpected system-level error and brings us into a situation where we ain't got no clue why it happened. The second one is a logical error that is entirely expected given that lockfiles are explicitly designed for this failure mode, so we know why they happen. With this in mind, I'd argue that we should only continue with the transaction in the latter case, and abort on unexpected system-level errors. > For the reftable backend, syscalls are inherently covering all the refs > anyway (we either commit the whole reftable update or not). So likewise > there would be no different between partial-update and non-atomic modes > (but they would both be different from the files backend). > > I suspect the surgery needed for the ref-transaction code to allow > non-atomic updates would be pretty big, though. It involves checking > every error case to make sure it is safe to continue rather than > aborting (and munging data structures to mark particular refs as > "failed, don't do anything further for this one"). I hope that it's not going to be that bad if we restrict it to the "prepare" phase, but that may just be wishful thinking. > So I dunno. All of my analysis assumes the breakdown of user > expectations I gave above is a reasonable one. There may be others. But > it seems like the behavior created by just this patch would be very hard > to explain, and subject to change based on implementation details. > > -Peff I think your analysis makes sense. While I think that the three modes aren't as bad for a command like git-fetch(1), the "single-tx-partial-update" mode would really only be a special case of our current default, except that we use a single transaction, only. Users shouldn't really need to care about this, and we should do the right thing by default. The discussion mostly came to live because our current implementation of reference transactions is lacking the ability to handle this partial-update mode well (and it didn't really have to until now), but pushing this technical limitation onto the user is probably nothing we should do. In the end I agree that we ought to extend the reftx mechanism to support this usecase. While it's more work up-front, I expect that other commands could benefit in a similar way without having to add `--partial-atomic-reference-updates` switches to all of them. Patrick
Patrick Steinhardt <ps@pks.im> writes: > instead of creating one slice per updated ref. While this inefficiency > can be easily mitigated by using the `--atomic` flag, this flag cannot > be used in contexts where we want partial-update semantics. Interesting and puzzling. In today's code, we use a single transaction when "atomic" is asked for, so that we can open a transaction, prepare bunch of ref updates, and say "commit" to commit all of them and let the ref_transaction layer to make the whole thing all-or-none. If we now use a single transaction for two refs update that do not have to be atomic, it is surprising (from the diffstat) that we can do so without changing anything in the ref_transaction layer. Doesn't the caller at least need to say "this transaction is best-effort 'partial-update' (whatever it means)" vs "this transaction is all-or-none"? And doesn't the ref_transaction layer now need to implement the 'partial-update' thing? > Signed-off-by: Patrick Steinhardt <ps@pks.im> > --- > builtin/fetch.c | 78 ++++++++++++++++--------------------------------- > 1 file changed, 25 insertions(+), 53 deletions(-)
Jeff King <peff@peff.net> writes: > I suspect the surgery needed for the ref-transaction code to allow > non-atomic updates would be pretty big, though. It involves checking > every error case to make sure it is safe to continue rather than > aborting (and munging data structures to mark particular refs as > "failed, don't do anything further for this one"). > > So I dunno. All of my analysis assumes the breakdown of user > expectations I gave above is a reasonable one. There may be others. But > it seems like the behavior created by just this patch would be very hard > to explain, and subject to change based on implementation details. Oh, I should have read your analysis before reacting to the original message myself ;-) Yes, current callers of ref-transaction code may have some logic to decide that it is not even worth telling a proposed change to the ref API (e.g. non-fast-forward) but that does feel like an unnecessary implementation detail, and a true "partial transaction" needs cooperation by the ref-transaction layer. And when it is done, we do not have to explain anything to the user beyond what we already do. The "--atomic" option will make it all-or-none, and without it, changes to each ref may or may not fail individually with its own reason, without correlation to the outcome of the changes to any other refs. And use of single or multiple transactions just becomes an implementation detail of non-atomic updates.
On Thu, Dec 09, 2021 at 08:11:34AM +0100, Patrick Steinhardt wrote: > > For the files-backend code, I think system errors would naturally fall > > out in the same code. Failing to write() or rename() is not much > > different than failing to get the lock in the first place. So > > "partial-update" and "non-atomic" behavior would end up the same anyway, > > and we do not need to expose the third mode to the user. > > I think I disagree here. Failing to write() to me is quite different > from failing to take a lock: the first one is an unexpected system-level > error and brings us into a situation where we ain't got no clue why it > happened. The second one is a logical error that is entirely expected > given that lockfiles are explicitly designed for this failure mode, so > we know why they happen. With this in mind, I'd argue that we should > only continue with the transaction in the latter case, and abort on > unexpected system-level errors. Just to be clear, it's not that I necessarily think these error types are logically related. I only meant that once you are detecting and recovering from one type, it would be easy to implement it either way. I'd be OK with either type of behavior. If that was the only difference between partial-update and non-atomic, though, I'm not sure if that merits exposing the complexity of a "third mode" to the user. But I don't feel strongly about it either way. > > I suspect the surgery needed for the ref-transaction code to allow > > non-atomic updates would be pretty big, though. It involves checking > > every error case to make sure it is safe to continue rather than > > aborting (and munging data structures to mark particular refs as > > "failed, don't do anything further for this one"). > > I hope that it's not going to be that bad if we restrict it to the > "prepare" phase, but that may just be wishful thinking. Yeah, maybe. :) I didn't look closely, so it may not be too bad. I just remember the refs system being very finicky about things like failure, races, etc. But I'm sure you'll figure it out once you start looking closely. :) One thing to watch out for is that in the files backend, _part_ of the update may be shared by multiple refs: updating the packed-refs file (if we are deleting refs). So if you are deleting "refs/heads/foo" and "refs/heads/bar", but taking the lock on "foo" fails, you'd want to make sure only to delete "bar" from packed-refs, not "foo". -Peff
diff --git a/builtin/fetch.c b/builtin/fetch.c index f7abbc31ff..c4cfd55452 100644 --- a/builtin/fetch.c +++ b/builtin/fetch.c @@ -641,9 +641,6 @@ static struct ref *get_ref_map(struct remote *remote, return ref_map; } -#define STORE_REF_ERROR_OTHER 1 -#define STORE_REF_ERROR_DF_CONFLICT 2 - static int s_update_ref(const char *action, struct ref *ref, struct ref_transaction *transaction, @@ -651,7 +648,6 @@ static int s_update_ref(const char *action, { char *msg; char *rla = getenv("GIT_REFLOG_ACTION"); - struct ref_transaction *our_transaction = NULL; struct strbuf err = STRBUF_INIT; int ret; @@ -661,44 +657,12 @@ static int s_update_ref(const char *action, rla = default_rla.buf; msg = xstrfmt("%s: %s", rla, action); - /* - * If no transaction was passed to us, we manage the transaction - * ourselves. Otherwise, we trust the caller to handle the transaction - * lifecycle. - */ - if (!transaction) { - transaction = our_transaction = ref_transaction_begin(&err); - if (!transaction) { - ret = STORE_REF_ERROR_OTHER; - goto out; - } - } - ret = ref_transaction_update(transaction, ref->name, &ref->new_oid, check_old ? &ref->old_oid : NULL, 0, msg, &err); - if (ret) { - ret = STORE_REF_ERROR_OTHER; - goto out; - } - - if (our_transaction) { - switch (ref_transaction_commit(our_transaction, &err)) { - case 0: - break; - case TRANSACTION_NAME_CONFLICT: - ret = STORE_REF_ERROR_DF_CONFLICT; - goto out; - default: - ret = STORE_REF_ERROR_OTHER; - goto out; - } - } - -out: - ref_transaction_free(our_transaction); if (ret) error("%s", err.buf); + strbuf_release(&err); free(msg); return ret; @@ -1107,12 +1071,10 @@ static int store_updated_refs(const char *raw_url, const char *remote_name, } } - if (atomic_fetch) { - transaction = ref_transaction_begin(&err); - if (!transaction) { - error("%s", err.buf); - goto abort; - } + transaction = ref_transaction_begin(&err); + if (!transaction) { + error("%s", err.buf); + goto abort; } prepare_format_display(ref_map); @@ -1229,21 +1191,31 @@ static int store_updated_refs(const char *raw_url, const char *remote_name, } } - if (!rc && transaction) { - rc = ref_transaction_commit(transaction, &err); - if (rc) { - error("%s", err.buf); - goto abort; - } + if (rc && atomic_fetch) { + error(_("aborting reference updates because of atomic fetch")); + goto abort; } - if (!rc) - commit_fetch_head(&fetch_head); - - if (rc & STORE_REF_ERROR_DF_CONFLICT) + switch (ref_transaction_commit(transaction, &err)) { + case 0: + break; + case TRANSACTION_NAME_CONFLICT: error(_("some local refs could not be updated; try running\n" " 'git remote prune %s' to remove any old, conflicting " "branches"), remote_name); + rc = -1; + break; + default: + error("%s", err.buf); + rc = -1; + break; + } + + if (rc && atomic_fetch) + goto abort; + + if (!rc) + commit_fetch_head(&fetch_head); if (advice_enabled(ADVICE_FETCH_SHOW_FORCED_UPDATES)) { if (!fetch_show_forced_updates) {
When git-fetch(1) isn't called with the `--atomic` flag, then each reference will be updated in a separate transaction. As a result, even if we failed to update a subset of transactions, the remaining refs will be modified after the command has finished. While this pattern is reasonably efficient with the files backend where we'd lock and write each file separately anyway, the upcoming reftable backend could handle such an update a lot more efficiently if it was batched into a single transaction given that it could then create a single new reftable slice instead of creating one slice per updated ref. While this inefficiency can be easily mitigated by using the `--atomic` flag, this flag cannot be used in contexts where we want partial-update semantics. Convert git-fetch(1) to always use a single reference transaction, regardless of whether it is called with `--atomic` or not. The only remaining difference between both modes is that with `--atomic` set, we'd abort the transaciton in case at least one reference cannot be updated. Note that this slightly changes semantics of git-fetch(1): if we hit any unexpected errors like the reference update racing with another process, then we'll now fail to update any references, too. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- builtin/fetch.c | 78 ++++++++++++++++--------------------------------- 1 file changed, 25 insertions(+), 53 deletions(-) Hi, until now, we have been quite lenient with creating lots of reference transactions even though they could've been bundled together into a single transaction. After all, it didn't have much of a downside in most contexts with the files backend: we'd have to lock each loose ref separately anyway. I'd like to get some feedback on changing our stance here, due to multiple reasons: - The upcoming reftable backend will be more efficient if we use a single transaction to bundle together multiple ref updates given that it only needs to write one new slice instead of one per update. - Syncing refs to disk can be batched more efficiently if we bundle ref updates. See my initial patch series to implement fsync-ing refs [1] and Peff's benchmarks [2] demonstrating that fetches may become a lot slower. - The reference-transaction hook can be used more efficiently given that it would only need to execute twice, instead of twice per updated ref. It also has a more global view of what's happening. While this is a concern of mine, it's a non-reason in the context of the Git project given that we really ought not to change semantics only to appease this hook. With these reasons in mind, I'm wondering whether we want to accept refactorings which convert existing code to use batched reference transactions. While the potential performance improvements are a rather obvious upside, the downside is that it would definitely change the failure mode in many cases. The example I have here with git-fetch(1) would mean that if we were to race with any other process or if any other unexpected error occurs which leads us to die, then we'll not commit any change to disk. This can be seen as an improvement in consistency, but it can also be seen as a change which breaks current semantics of trying to do as much work as possible. I'd thus love to hear about any opinions on this topic. Patrick [1]: <cover.1636544377.git.ps@pks.im> [2]: <YYwvVy6AWDjkWazn@coredump.intra.peff.net>