diff mbox series

[RFC] fetch: update refs in a single transaction

Message ID 259de62b26302c10f429d52bec42a8a954bc5ba3.1638886972.git.ps@pks.im (mailing list archive)
State New, archived
Headers show
Series [RFC] fetch: update refs in a single transaction | expand

Commit Message

Patrick Steinhardt Dec. 7, 2021, 2:24 p.m. UTC
When git-fetch(1) isn't called with the `--atomic` flag, then each
reference will be updated in a separate transaction. As a result, even
if we failed to update a subset of transactions, the remaining refs will
be modified after the command has finished. While this pattern is
reasonably efficient with the files backend where we'd lock and write
each file separately anyway, the upcoming reftable backend could handle
such an update a lot more efficiently if it was batched into a single
transaction given that it could then create a single new reftable slice
instead of creating one slice per updated ref. While this inefficiency
can be easily mitigated by using the `--atomic` flag, this flag cannot
be used in contexts where we want partial-update semantics.

Convert git-fetch(1) to always use a single reference transaction,
regardless of whether it is called with `--atomic` or not. The only
remaining difference between both modes is that with `--atomic` set,
we'd abort the transaciton in case at least one reference cannot be
updated.

Note that this slightly changes semantics of git-fetch(1): if we hit any
unexpected errors like the reference update racing with another process,
then we'll now fail to update any references, too.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/fetch.c | 78 ++++++++++++++++---------------------------------
 1 file changed, 25 insertions(+), 53 deletions(-)

Hi,

until now, we have been quite lenient with creating lots of reference
transactions even though they could've been bundled together into a
single transaction. After all, it didn't have much of a downside in most
contexts with the files backend: we'd have to lock each loose ref
separately anyway. I'd like to get some feedback on changing our stance
here, due to multiple reasons:

    - The upcoming reftable backend will be more efficient if we use a
      single transaction to bundle together multiple ref updates given
      that it only needs to write one new slice instead of one per
      update.

    - Syncing refs to disk can be batched more efficiently if we bundle
      ref updates. See my initial patch series to implement fsync-ing
      refs [1] and Peff's benchmarks [2] demonstrating that fetches may
      become a lot slower.

    - The reference-transaction hook can be used more efficiently given
      that it would only need to execute twice, instead of twice per
      updated ref. It also has a more global view of what's happening.
      While this is a concern of mine, it's a non-reason in the context
      of the Git project given that we really ought not to change
      semantics only to appease this hook.

With these reasons in mind, I'm wondering whether we want to accept
refactorings which convert existing code to use batched reference
transactions. While the potential performance improvements are a rather
obvious upside, the downside is that it would definitely change the
failure mode in many cases.

The example I have here with git-fetch(1) would mean that if we were to
race with any other process or if any other unexpected error occurs
which leads us to die, then we'll not commit any change to disk. This
can be seen as an improvement in consistency, but it can also be seen as
a change which breaks current semantics of trying to do as much work as
possible.

I'd thus love to hear about any opinions on this topic.

Patrick

[1]: <cover.1636544377.git.ps@pks.im>
[2]: <YYwvVy6AWDjkWazn@coredump.intra.peff.net>

Comments

Christian Couder Dec. 8, 2021, 8:15 a.m. UTC | #1
On Tue, Dec 7, 2021 at 11:11 PM Patrick Steinhardt <ps@pks.im> wrote:
>
> When git-fetch(1) isn't called with the `--atomic` flag, then each
> reference will be updated in a separate transaction. As a result, even
> if we failed to update a subset of transactions, the remaining refs will
> be modified after the command has finished. While this pattern is
> reasonably efficient with the files backend where we'd lock and write
> each file separately anyway, the upcoming reftable backend could handle
> such an update a lot more efficiently if it was batched into a single
> transaction given that it could then create a single new reftable slice
> instead of creating one slice per updated ref. While this inefficiency
> can be easily mitigated by using the `--atomic` flag, this flag cannot
> be used in contexts where we want partial-update semantics.

So it seems to me that there are 3 possible behaviors/modes:

  - "atomic": a single transaction with all or nothing semantics,
currently implemented with --atomic
  - "partial-update": a single transaction with partial-update
semantics, the new behavior this patch is implementing
  - "non-atomic": many transactions (one per ref?), currently the
default when --atomic isn't passed

Yeah, "partial-update" seems better than "non-atomic" when the
reftable backend is used. But what happens  when "partial-update" is
used with the files backend?

> Convert git-fetch(1) to always use a single reference transaction,
> regardless of whether it is called with `--atomic` or not. The only
> remaining difference between both modes is that with `--atomic` set,
> we'd abort the transaciton in case at least one reference cannot be
> updated.

Nit: I would say "as soon as one reference cannot be updated"

> Note that this slightly changes semantics of git-fetch(1): if we hit any
> unexpected errors like the reference update racing with another process,
> then we'll now fail to update any references, too.

So that's one difference between the "partial-update" and the
"non-atomic" modes, but what you say doesn't really make me confident
that it's the only one.

Also are there people who are in cases where a lot of reference
updates are racing, and where it's important that in such cases at
least some ref updates succeed?

If yes, then maybe the 3 modes could be kept and options for
"partial-update" and "non-atomic" could be added. "partial-update"
could be the default in case the reftable backend is used, while
"non-atomic" would still be the default when the files backend is
used.

> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  builtin/fetch.c | 78 ++++++++++++++++---------------------------------
>  1 file changed, 25 insertions(+), 53 deletions(-)
>
> Hi,
>
> until now, we have been quite lenient with creating lots of reference
> transactions even though they could've been bundled together into a
> single transaction. After all, it didn't have much of a downside in most
> contexts with the files backend: we'd have to lock each loose ref
> separately anyway. I'd like to get some feedback on changing our stance
> here, due to multiple reasons:
>
>     - The upcoming reftable backend will be more efficient if we use a
>       single transaction to bundle together multiple ref updates given
>       that it only needs to write one new slice instead of one per
>       update.
>
>     - Syncing refs to disk can be batched more efficiently if we bundle
>       ref updates. See my initial patch series to implement fsync-ing
>       refs [1] and Peff's benchmarks [2] demonstrating that fetches may
>       become a lot slower.

Maybe it's explained in the pointers, but is this in the case of the
files backend or the reftable backend or both?

>     - The reference-transaction hook can be used more efficiently given
>       that it would only need to execute twice, instead of twice per
>       updated ref.

Yeah, so that could be a regression for people who expect it to run
twice per updated ref.

>       It also has a more global view of what's happening.
>       While this is a concern of mine, it's a non-reason in the context
>       of the Git project given that we really ought not to change
>       semantics only to appease this hook.

Not sure I understand what you are saying here.

> With these reasons in mind, I'm wondering whether we want to accept
> refactorings which convert existing code to use batched reference
> transactions. While the potential performance improvements are a rather
> obvious upside, the downside is that it would definitely change the
> failure mode in many cases.

Another downside is that it changes how and when reference-transaction
hooks are called.

> The example I have here with git-fetch(1) would mean that if we were to
> race with any other process or if any other unexpected error occurs
> which leads us to die, then we'll not commit any change to disk. This
> can be seen as an improvement in consistency, but it can also be seen as
> a change which breaks current semantics of trying to do as much work as
> possible.

I like the idea, but it seems a bit safer to me to have 3 different
modes, so that people can test them in real life first for some time.
Then we might later be more confident with changing the default for
some backends. I might be worrying too much though, as very few people
probably use reference-transaction hooks.
Patrick Steinhardt Dec. 8, 2021, 8:48 a.m. UTC | #2
On Wed, Dec 08, 2021 at 09:15:01AM +0100, Christian Couder wrote:
> On Tue, Dec 7, 2021 at 11:11 PM Patrick Steinhardt <ps@pks.im> wrote:
> >
> > When git-fetch(1) isn't called with the `--atomic` flag, then each
> > reference will be updated in a separate transaction. As a result, even
> > if we failed to update a subset of transactions, the remaining refs will
> > be modified after the command has finished. While this pattern is
> > reasonably efficient with the files backend where we'd lock and write
> > each file separately anyway, the upcoming reftable backend could handle
> > such an update a lot more efficiently if it was batched into a single
> > transaction given that it could then create a single new reftable slice
> > instead of creating one slice per updated ref. While this inefficiency
> > can be easily mitigated by using the `--atomic` flag, this flag cannot
> > be used in contexts where we want partial-update semantics.
> 
> So it seems to me that there are 3 possible behaviors/modes:
> 
>   - "atomic": a single transaction with all or nothing semantics,
> currently implemented with --atomic
>   - "partial-update": a single transaction with partial-update
> semantics, the new behavior this patch is implementing
>   - "non-atomic": many transactions (one per ref?), currently the
> default when --atomic isn't passed
> 
> Yeah, "partial-update" seems better than "non-atomic" when the
> reftable backend is used. But what happens  when "partial-update" is
> used with the files backend?

Yeah, putting this in these three modes catches the gist of how it can
work. Currently, we only support "atomic" and "non-atomic", and this
patch proposes to replace "non-atomic" with "partial-update". I'd be
happy to instead implement it as a third mode, but if we can find a way
to "do the right thing" without having to introduce another option, then
that would be best. But as per your comments below and my own concerns
I'm not sure we can easily implement this without breaking existing
usecases.

> > Convert git-fetch(1) to always use a single reference transaction,
> > regardless of whether it is called with `--atomic` or not. The only
> > remaining difference between both modes is that with `--atomic` set,
> > we'd abort the transaciton in case at least one reference cannot be
> > updated.
> 
> Nit: I would say "as soon as one reference cannot be updated"
> 
> > Note that this slightly changes semantics of git-fetch(1): if we hit any
> > unexpected errors like the reference update racing with another process,
> > then we'll now fail to update any references, too.
> 
> So that's one difference between the "partial-update" and the
> "non-atomic" modes, but what you say doesn't really make me confident
> that it's the only one.

There can certainly be other edge cases which break, and I'm sure others
will be able to pinpoint exactly which these are.

> Also are there people who are in cases where a lot of reference
> updates are racing, and where it's important that in such cases at
> least some ref updates succeed?
>
> If yes, then maybe the 3 modes could be kept and options for
> "partial-update" and "non-atomic" could be added. "partial-update"
> could be the default in case the reftable backend is used, while
> "non-atomic" would still be the default when the files backend is
> used.

I bet there are.

> > Signed-off-by: Patrick Steinhardt <ps@pks.im>
> > ---
> >  builtin/fetch.c | 78 ++++++++++++++++---------------------------------
> >  1 file changed, 25 insertions(+), 53 deletions(-)
> >
> > Hi,
> >
> > until now, we have been quite lenient with creating lots of reference
> > transactions even though they could've been bundled together into a
> > single transaction. After all, it didn't have much of a downside in most
> > contexts with the files backend: we'd have to lock each loose ref
> > separately anyway. I'd like to get some feedback on changing our stance
> > here, due to multiple reasons:
> >
> >     - The upcoming reftable backend will be more efficient if we use a
> >       single transaction to bundle together multiple ref updates given
> >       that it only needs to write one new slice instead of one per
> >       update.
> >
> >     - Syncing refs to disk can be batched more efficiently if we bundle
> >       ref updates. See my initial patch series to implement fsync-ing
> >       refs [1] and Peff's benchmarks [2] demonstrating that fetches may
> >       become a lot slower.
> 
> Maybe it's explained in the pointers, but is this in the case of the
> files backend or the reftable backend or both?

The discussion in [1] was in the case of the files backend: if we want
to guarantee that we never write corrupt refs to disk, then we must sync
written data to disk before moving refs into place. If using multiple
transactions, then we thus need to fsync(3P) in each commit. If we're
using a single transaction though, we can amortize the costs and batch
this, where we only fsync(3P) once after all references have been
prepared.

The same is effectively true for the reftable backend though: if we want
to ensure that reftable files are safe, then we'd have to sync them to
disk, as well.

> >     - The reference-transaction hook can be used more efficiently given
> >       that it would only need to execute twice, instead of twice per
> >       updated ref.
> 
> Yeah, so that could be a regression for people who expect it to run
> twice per updated ref.

The reference-transaction hook was always a mechanism which hooks into
internals. As such, how often the hook executes and in which order was
never guaranteed and is expected to change when implementation details
change. So this is not something I'd ever call a regression: the whole
intent of the hook is to be able to see what happens internally, not to
assert semantics or the order in which updates are performed for a
specific Git command. That's kind the trade-off with using a low-level
hook as this one.

> >       It also has a more global view of what's happening.
> >       While this is a concern of mine, it's a non-reason in the context
> >       of the Git project given that we really ought not to change
> >       semantics only to appease this hook.
> 
> Not sure I understand what you are saying here.

Previously you saw `2*len(refs)` executions of the hook, so you'd have
to piece together the puzzle to see what happens on a more global level
yourself. With the change of using a single transaction, you'd get only
two executions of the hook, where each execution has the complete
picture of all refs which are about to be updated.

> > With these reasons in mind, I'm wondering whether we want to accept
> > refactorings which convert existing code to use batched reference
> > transactions. While the potential performance improvements are a rather
> > obvious upside, the downside is that it would definitely change the
> > failure mode in many cases.
> 
> Another downside is that it changes how and when reference-transaction
> hooks are called.

As mentioned above, this is nothing I'd call a downside. We cannot ever
guarantee stable execution order of this hook, or otherwise we paint
ourselves into a corner.

> > The example I have here with git-fetch(1) would mean that if we were to
> > race with any other process or if any other unexpected error occurs
> > which leads us to die, then we'll not commit any change to disk. This
> > can be seen as an improvement in consistency, but it can also be seen as
> > a change which breaks current semantics of trying to do as much work as
> > possible.
> 
> I like the idea, but it seems a bit safer to me to have 3 different
> modes, so that people can test them in real life first for some time.
> Then we might later be more confident with changing the default for
> some backends. I might be worrying too much though, as very few people
> probably use reference-transaction hooks.

Fair, I'll wait a bit for more feedback and then probably post a v2 with
a third mode.

I think changing behaviour depending on which ref backend would be bad
precedent though. Ultimately, the user shouldn't notice which backend is
in use, except that it would ideally be faster if using the new reftable
backend. But slightly changing semantics of commands depending on which
backend is active is only going to lead to confusion.

Patrick
Jeff King Dec. 8, 2021, 9:13 p.m. UTC | #3
On Wed, Dec 08, 2021 at 09:15:01AM +0100, Christian Couder wrote:

> > Note that this slightly changes semantics of git-fetch(1): if we hit any
> > unexpected errors like the reference update racing with another process,
> > then we'll now fail to update any references, too.
> 
> So that's one difference between the "partial-update" and the
> "non-atomic" modes, but what you say doesn't really make me confident
> that it's the only one.
> 
> Also are there people who are in cases where a lot of reference
> updates are racing, and where it's important that in such cases at
> least some ref updates succeed?
> 
> If yes, then maybe the 3 modes could be kept and options for
> "partial-update" and "non-atomic" could be added. "partial-update"
> could be the default in case the reftable backend is used, while
> "non-atomic" would still be the default when the files backend is
> used.

I think these three modes are hard to explain to users, because the
failures which trigger for partial-update versus atomic depend on how
things happen to be implemented. Just coming from a user's perspective,
we might expect a breakdown like:

  - problems like non-fast-forward are logical policy issues, and we'd
    reject those without failing the whole transaction (in
    partial-update mode)

  - a system error like write() failing should be rare, and abandons the
    whole transaction (in either mode)

But there are some confusing cases:

  - if somebody else takes the lock and updates the ref at the same
    time, then that is handled in the ref transaction code. And so even
    though it's closer to a logical policy issue, the patch here would
    fail the whole transaction

  - likewise name conflicts (writing "refs/foo" when "refs/foo/bar"
    exists or vice versa) are more of a logical issue, but are also
    handled by the transaction code.

So I think we really have to teach the ref transaction code the notion
of non-atomic transactions, or we'll end up leaking out all of those
implementation details in user-facing behavior. I.e., the ref code needs
to learn not to immediately abort if it fails one lockfile, but to
optionally keep going (if the caller specified a non-atomic flag, of
course).

For the files-backend code, I think system errors would naturally fall
out in the same code. Failing to write() or rename() is not much
different than failing to get the lock in the first place. So
"partial-update" and "non-atomic" behavior would end up the same anyway,
and we do not need to expose the third mode to the user.

For the reftable backend, syscalls are inherently covering all the refs
anyway (we either commit the whole reftable update or not). So likewise
there would be no different between partial-update and non-atomic modes
(but they would both be different from the files backend).

I suspect the surgery needed for the ref-transaction code to allow
non-atomic updates would be pretty big, though. It involves checking
every error case to make sure it is safe to continue rather than
aborting (and munging data structures to mark particular refs as
"failed, don't do anything further for this one").

So I dunno. All of my analysis assumes the breakdown of user
expectations I gave above is a reasonable one. There may be others. But
it seems like the behavior created by just this patch would be very hard
to explain, and subject to change based on implementation details.

-Peff
Patrick Steinhardt Dec. 9, 2021, 7:11 a.m. UTC | #4
On Wed, Dec 08, 2021 at 04:13:48PM -0500, Jeff King wrote:
> On Wed, Dec 08, 2021 at 09:15:01AM +0100, Christian Couder wrote:
> 
> > > Note that this slightly changes semantics of git-fetch(1): if we hit any
> > > unexpected errors like the reference update racing with another process,
> > > then we'll now fail to update any references, too.
> > 
> > So that's one difference between the "partial-update" and the
> > "non-atomic" modes, but what you say doesn't really make me confident
> > that it's the only one.
> > 
> > Also are there people who are in cases where a lot of reference
> > updates are racing, and where it's important that in such cases at
> > least some ref updates succeed?
> > 
> > If yes, then maybe the 3 modes could be kept and options for
> > "partial-update" and "non-atomic" could be added. "partial-update"
> > could be the default in case the reftable backend is used, while
> > "non-atomic" would still be the default when the files backend is
> > used.
> 
> I think these three modes are hard to explain to users, because the
> failures which trigger for partial-update versus atomic depend on how
> things happen to be implemented. Just coming from a user's perspective,
> we might expect a breakdown like:
> 
>   - problems like non-fast-forward are logical policy issues, and we'd
>     reject those without failing the whole transaction (in
>     partial-update mode)
> 
>   - a system error like write() failing should be rare, and abandons the
>     whole transaction (in either mode)
> 
> But there are some confusing cases:
> 
>   - if somebody else takes the lock and updates the ref at the same
>     time, then that is handled in the ref transaction code. And so even
>     though it's closer to a logical policy issue, the patch here would
>     fail the whole transaction
> 
>   - likewise name conflicts (writing "refs/foo" when "refs/foo/bar"
>     exists or vice versa) are more of a logical issue, but are also
>     handled by the transaction code.
> 
> So I think we really have to teach the ref transaction code the notion
> of non-atomic transactions, or we'll end up leaking out all of those
> implementation details in user-facing behavior. I.e., the ref code needs
> to learn not to immediately abort if it fails one lockfile, but to
> optionally keep going (if the caller specified a non-atomic flag, of
> course).

That would probably be the most flexible approach for the future, too.
There's going to be several locations which could benefit from such a
change, where we do want to get the performance benefits of using a
single transaction while continuing to exhibit the same behaviour in
edge cases where only a subset of refs could be updated.

For what it's worth, I think that such a new mode should likely only
kick in when "preparing" a transaction: this is the last step before
"committing" it and would thus be the only place where the caller has a
chance to introspect what really has been queued up and what had to be
dropped from the transaction due to races or whatnot. As long as callers
get a list of all dropped refs, including why they have been dropped,
they can also still communicate this information to the user.

> For the files-backend code, I think system errors would naturally fall
> out in the same code. Failing to write() or rename() is not much
> different than failing to get the lock in the first place. So
> "partial-update" and "non-atomic" behavior would end up the same anyway,
> and we do not need to expose the third mode to the user.

I think I disagree here. Failing to write() to me is quite different
from failing to take a lock: the first one is an unexpected system-level
error and brings us into a situation where we ain't got no clue why it
happened. The second one is a logical error that is entirely expected
given that lockfiles are explicitly designed for this failure mode, so
we know why they happen. With this in mind, I'd argue that we should
only continue with the transaction in the latter case, and abort on
unexpected system-level errors.

> For the reftable backend, syscalls are inherently covering all the refs
> anyway (we either commit the whole reftable update or not). So likewise
> there would be no different between partial-update and non-atomic modes
> (but they would both be different from the files backend).
> 
> I suspect the surgery needed for the ref-transaction code to allow
> non-atomic updates would be pretty big, though. It involves checking
> every error case to make sure it is safe to continue rather than
> aborting (and munging data structures to mark particular refs as
> "failed, don't do anything further for this one").

I hope that it's not going to be that bad if we restrict it to the
"prepare" phase, but that may just be wishful thinking.

> So I dunno. All of my analysis assumes the breakdown of user
> expectations I gave above is a reasonable one. There may be others. But
> it seems like the behavior created by just this patch would be very hard
> to explain, and subject to change based on implementation details.
> 
> -Peff

I think your analysis makes sense.

While I think that the three modes aren't as bad for a command like
git-fetch(1), the "single-tx-partial-update" mode would really only be a
special case of our current default, except that we use a single
transaction, only. Users shouldn't really need to care about this, and
we should do the right thing by default. The discussion mostly came to
live because our current implementation of reference transactions is
lacking the ability to handle this partial-update mode well (and it
didn't really have to until now), but pushing this technical limitation
onto the user is probably nothing we should do.

In the end I agree that we ought to extend the reftx mechanism to
support this usecase. While it's more work up-front, I expect that other
commands could benefit in a similar way without having to add
`--partial-atomic-reference-updates` switches to all of them.

Patrick
Junio C Hamano Dec. 9, 2021, 9:53 p.m. UTC | #5
Patrick Steinhardt <ps@pks.im> writes:

> instead of creating one slice per updated ref. While this inefficiency
> can be easily mitigated by using the `--atomic` flag, this flag cannot
> be used in contexts where we want partial-update semantics.

Interesting and puzzling.  In today's code, we use a single
transaction when "atomic" is asked for, so that we can open a
transaction, prepare bunch of ref updates, and say "commit" to
commit all of them and let the ref_transaction layer to make the
whole thing all-or-none.  If we now use a single transaction for two
refs update that do not have to be atomic, it is surprising (from
the diffstat) that we can do so without changing anything in the
ref_transaction layer.  Doesn't the caller at least need to say
"this transaction is best-effort 'partial-update' (whatever it
means)" vs "this transaction is all-or-none"?  And doesn't the
ref_transaction layer now need to implement the 'partial-update'
thing?

> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  builtin/fetch.c | 78 ++++++++++++++++---------------------------------
>  1 file changed, 25 insertions(+), 53 deletions(-)
Junio C Hamano Dec. 9, 2021, 10:16 p.m. UTC | #6
Jeff King <peff@peff.net> writes:

> I suspect the surgery needed for the ref-transaction code to allow
> non-atomic updates would be pretty big, though. It involves checking
> every error case to make sure it is safe to continue rather than
> aborting (and munging data structures to mark particular refs as
> "failed, don't do anything further for this one").
>
> So I dunno. All of my analysis assumes the breakdown of user
> expectations I gave above is a reasonable one. There may be others. But
> it seems like the behavior created by just this patch would be very hard
> to explain, and subject to change based on implementation details.

Oh, I should have read your analysis before reacting to the original
message myself ;-)

Yes, current callers of ref-transaction code may have some logic to
decide that it is not even worth telling a proposed change to the
ref API (e.g. non-fast-forward) but that does feel like an
unnecessary implementation detail, and a true "partial transaction"
needs cooperation by the ref-transaction layer.

And when it is done, we do not have to explain anything to the user
beyond what we already do.  The "--atomic" option will make it
all-or-none, and without it, changes to each ref may or may not fail
individually with its own reason, without correlation to the outcome
of the changes to any other refs.  And use of single or multiple
transactions just becomes an implementation detail of non-atomic
updates.
Jeff King Dec. 10, 2021, 9:04 a.m. UTC | #7
On Thu, Dec 09, 2021 at 08:11:34AM +0100, Patrick Steinhardt wrote:

> > For the files-backend code, I think system errors would naturally fall
> > out in the same code. Failing to write() or rename() is not much
> > different than failing to get the lock in the first place. So
> > "partial-update" and "non-atomic" behavior would end up the same anyway,
> > and we do not need to expose the third mode to the user.
> 
> I think I disagree here. Failing to write() to me is quite different
> from failing to take a lock: the first one is an unexpected system-level
> error and brings us into a situation where we ain't got no clue why it
> happened. The second one is a logical error that is entirely expected
> given that lockfiles are explicitly designed for this failure mode, so
> we know why they happen. With this in mind, I'd argue that we should
> only continue with the transaction in the latter case, and abort on
> unexpected system-level errors.

Just to be clear, it's not that I necessarily think these error types
are logically related. I only meant that once you are detecting and
recovering from one type, it would be easy to implement it either way.

I'd be OK with either type of behavior. If that was the only difference
between partial-update and non-atomic, though, I'm not sure if that
merits exposing the complexity of a "third mode" to the user. But I
don't feel strongly about it either way.

> > I suspect the surgery needed for the ref-transaction code to allow
> > non-atomic updates would be pretty big, though. It involves checking
> > every error case to make sure it is safe to continue rather than
> > aborting (and munging data structures to mark particular refs as
> > "failed, don't do anything further for this one").
> 
> I hope that it's not going to be that bad if we restrict it to the
> "prepare" phase, but that may just be wishful thinking.

Yeah, maybe. :) I didn't look closely, so it may not be too bad. I just
remember the refs system being very finicky about things like failure,
races, etc. But I'm sure you'll figure it out once you start looking
closely. :)

One thing to watch out for is that in the files backend, _part_ of the
update may be shared by multiple refs: updating the packed-refs file (if
we are deleting refs). So if you are deleting "refs/heads/foo" and
"refs/heads/bar", but taking the lock on "foo" fails, you'd want to make
sure only to delete "bar" from packed-refs, not "foo".

-Peff
diff mbox series

Patch

diff --git a/builtin/fetch.c b/builtin/fetch.c
index f7abbc31ff..c4cfd55452 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -641,9 +641,6 @@  static struct ref *get_ref_map(struct remote *remote,
 	return ref_map;
 }
 
-#define STORE_REF_ERROR_OTHER 1
-#define STORE_REF_ERROR_DF_CONFLICT 2
-
 static int s_update_ref(const char *action,
 			struct ref *ref,
 			struct ref_transaction *transaction,
@@ -651,7 +648,6 @@  static int s_update_ref(const char *action,
 {
 	char *msg;
 	char *rla = getenv("GIT_REFLOG_ACTION");
-	struct ref_transaction *our_transaction = NULL;
 	struct strbuf err = STRBUF_INIT;
 	int ret;
 
@@ -661,44 +657,12 @@  static int s_update_ref(const char *action,
 		rla = default_rla.buf;
 	msg = xstrfmt("%s: %s", rla, action);
 
-	/*
-	 * If no transaction was passed to us, we manage the transaction
-	 * ourselves. Otherwise, we trust the caller to handle the transaction
-	 * lifecycle.
-	 */
-	if (!transaction) {
-		transaction = our_transaction = ref_transaction_begin(&err);
-		if (!transaction) {
-			ret = STORE_REF_ERROR_OTHER;
-			goto out;
-		}
-	}
-
 	ret = ref_transaction_update(transaction, ref->name, &ref->new_oid,
 				     check_old ? &ref->old_oid : NULL,
 				     0, msg, &err);
-	if (ret) {
-		ret = STORE_REF_ERROR_OTHER;
-		goto out;
-	}
-
-	if (our_transaction) {
-		switch (ref_transaction_commit(our_transaction, &err)) {
-		case 0:
-			break;
-		case TRANSACTION_NAME_CONFLICT:
-			ret = STORE_REF_ERROR_DF_CONFLICT;
-			goto out;
-		default:
-			ret = STORE_REF_ERROR_OTHER;
-			goto out;
-		}
-	}
-
-out:
-	ref_transaction_free(our_transaction);
 	if (ret)
 		error("%s", err.buf);
+
 	strbuf_release(&err);
 	free(msg);
 	return ret;
@@ -1107,12 +1071,10 @@  static int store_updated_refs(const char *raw_url, const char *remote_name,
 		}
 	}
 
-	if (atomic_fetch) {
-		transaction = ref_transaction_begin(&err);
-		if (!transaction) {
-			error("%s", err.buf);
-			goto abort;
-		}
+	transaction = ref_transaction_begin(&err);
+	if (!transaction) {
+		error("%s", err.buf);
+		goto abort;
 	}
 
 	prepare_format_display(ref_map);
@@ -1229,21 +1191,31 @@  static int store_updated_refs(const char *raw_url, const char *remote_name,
 		}
 	}
 
-	if (!rc && transaction) {
-		rc = ref_transaction_commit(transaction, &err);
-		if (rc) {
-			error("%s", err.buf);
-			goto abort;
-		}
+	if (rc && atomic_fetch) {
+		error(_("aborting reference updates because of atomic fetch"));
+		goto abort;
 	}
 
-	if (!rc)
-		commit_fetch_head(&fetch_head);
-
-	if (rc & STORE_REF_ERROR_DF_CONFLICT)
+	switch (ref_transaction_commit(transaction, &err)) {
+	case 0:
+		break;
+	case TRANSACTION_NAME_CONFLICT:
 		error(_("some local refs could not be updated; try running\n"
 		      " 'git remote prune %s' to remove any old, conflicting "
 		      "branches"), remote_name);
+		rc = -1;
+		break;
+	default:
+		error("%s", err.buf);
+		rc = -1;
+		break;
+	}
+
+	if (rc && atomic_fetch)
+		goto abort;
+
+	if (!rc)
+		commit_fetch_head(&fetch_head);
 
 	if (advice_enabled(ADVICE_FETCH_SHOW_FORCED_UPDATES)) {
 		if (!fetch_show_forced_updates) {