Message ID | 20211013193127.76537-1-chooglen@google.com (mailing list archive) |
---|---|
Headers | show |
Series | remote: replace static variables with struct remote_state | expand |
Glen Choo <chooglen@google.com> writes: > While this concern is valid, I decided to keep this interface for a few > reasons: > > 1. The correct way of calling the function is 'obvious'. > 2. It is relatively easy to get the contained object (struct branch/remote) > and its containing struct repository/remote_state (e.g. we don't pass > struct branch or remote through long call chains). For "struct > branch", callers usually get the branch from the repo and use it > immediately. For "struct remote", we don't use container objects > outside of static functions. If you are interested in seeing all of > the call sites, you can see a sample commit in [3]. > 3. The separation between container/contained objects allows us to > reason better about what the correct interface is. e.g. we might be > tempted to include a backpointer from struct branch to struct > remote_state so that we can pass around struct branch and be > confident that struct branch has all of the information it needs. I am not following. None of the above reasons argue for forcing the functions that take contained object to also take its container. > However, I believe that some questions *shouldn't* be answered by > just struct branch. The prime example in this series is > branch_get_push() - it conceptually answers 'What is the pushremote > of this branch', but the behavior we want is closer to 'If > configured, give me the pushremote for the branch. Otherwise, give me > the default pushremote of the repo.'. This is arguably a relevant > detail that should be exposed to callers. It is a good example why such a function can just take an instance of branch, and the caller, (1) who does care about the fallback, can be assured that the logic falls back to the correct repository; and (2) who does not care about the fallback and sees it a mere implementation detail of "I am on this branch; give me the remote to push to", do not have to know what, other than the branch object, needs to be passed. if we explicitly record a branch object which repository it was taken from. There may be some other (real) reason where the resistance comes from, that you may not be telling us, though. But in what was described in the message I am responding to, I didn't see much convincing reason to argue _for_ keeping the contained objects ignorant of the container and forcing callers to pass both to functions that use both the container and contained to compute something. Thanks.
Junio C Hamano <gitster@pobox.com> writes: > There may be some other (real) reason where the resistance comes > from, that you may not be telling us, though. But in what was > described in the message I am responding to, I didn't see much > convincing reason to argue _for_ keeping the contained objects > ignorant of the container and forcing callers to pass both to > functions that use both the container and contained to compute > something. I am not you, so I can only speculate, but the real reason _could_ be that it makes it simpler to formulate steps 2 and 3 mechanically. After adding "repo" parameter to a function that used to take, say, a "branch", in step 3, a future clean-up series could add a .repo member to branch objects and remove the "repo" parameter from such function. I think that approach would make more work to get to the final state, though.
Junio C Hamano <gitster@pobox.com> writes: >> However, I believe that some questions *shouldn't* be answered by >> just struct branch. The prime example in this series is >> branch_get_push() - it conceptually answers 'What is the pushremote >> of this branch', but the behavior we want is closer to 'If >> configured, give me the pushremote for the branch. Otherwise, give me >> the default pushremote of the repo.'. This is arguably a relevant >> detail that should be exposed to callers. > > It is a good example why such a function can just take an instance > of branch, and the caller, > > (1) who does care about the fallback, can be assured that the logic > falls back to the correct repository; and > > (2) who does not care about the fallback and sees it a mere > implementation detail of "I am on this branch; give me the > remote to push to", do not have to know what, other than the > branch object, needs to be passed. > > if we explicitly record a branch object which repository it was > taken from. > > There may be some other (real) reason where the resistance comes > from, that you may not be telling us, though. But in what was > described in the message I am responding to, I didn't see much > convincing reason to argue _for_ keeping the contained objects > ignorant of the container and forcing callers to pass both to > functions that use both the container and contained to compute > something. My primary line of thinking is that adding the backpointer from struct branch to struct repository adds "unnecessary" coupling between the two objects, which causes the more concrete problems of: * Inconsistency between other functions that use struct repository as a "context object". We now find ourselves having to justify why branch functions can bypass the context object using a special pointer, whereas other functions and structs (say, struct object) cannot. * Interface misuse, where callers can now dereference branch->repository even though it is meant to be internal. This is also a possible source of inconsistency. * Extra memory consumption. A counterpoint to what I said is [1], where Jonathan eventually added the backpointer from struct ref_store to struct repository, which does give the nice benefits of referential integrity and abstraction that you cited. However in most of that series, struct ref_store is meant to be the context object, but due to poor internal abstraction, ref_store ends up depending on repository in some form or another and thus the backpointer is more defensible (as opposed to passing two context objects). I do not think we are in the same position with struct branch; struct branch seems sufficiently separated to me. I'm not opposed to adding a backpointer if it helps us get to a good final state, but I currently suspect that the final state still involves passing around struct repository as a context object. [1] https://lore.kernel.org/git/xmqqlf3gib0p.fsf@gitster.g/
Junio C Hamano <gitster@pobox.com> writes: > I am not you, so I can only speculate, but the real reason _could_ > be that it makes it simpler to formulate steps 2 and 3 mechanically. This is somewhat true, however... > I think that approach would make more work to get to the final > state, though. I found that if I were to add the backpointer, it would have been better to add it early and avoid this extra work that you mention here :)
Glen Choo <chooglen@google.com> writes: > My primary line of thinking is that adding the backpointer from struct > branch to struct repository adds "unnecessary" coupling between the two > objects,... meaning, that the paragraphs below will explain why the reference is unnecessary? > ...which causes the more concrete problems of: > * Inconsistency between other functions that use struct repository as a > "context object". We now find ourselves having to justify why branch > functions can bypass the context object using a special pointer, > whereas other functions and structs (say, struct object) cannot. Given that the API has been evolved over time from a "we deal only with a single repository and everybody else can just fork into a separate repository" to "now this function can work in the specified repository", it is understandable that some still do take repository as a separate parameter, even though the original counterpart that did not take a repository pointer took an object that clearly can belong to only one repository at a time. They would need to be cleaned up, and they do not make a good excuse to add more of them, when we know we are dealing with objects that belong only to one repository, like the "remote.c" contents we have been discussing. IOW, we need to start somewhere to clean things up, so either we teach multi-repository support to remote.c from day one, or we pass a repository as a separate and redundant parameter, with the intention to later clean up. I suspect that this piece is isolated enough that it is simpler to use as a starting point, and then the callers into remote.c API can then be cleaned up, gradually extending the circle. The original counerpart may not have been about "the other object" to begin with (e.g. "we have this string (not an object); resolve it to an object name in the context of the given repository"), in which case they are fine as they are, but I think most of the contents of "remote.c" do not fall into this category. > * Interface misuse, where callers can now dereference branch->repository > even though it is meant to be internal. This is also a possible source > of inconsistency. I do not think it is meant to be internal to begin with. If we declare, after careful consideration, that an in-core branch object given by remote.c API always belongs to one and only one repository, then following the branch->repository pointer should be the documented and kosher way to find out the repository by anybody who is given the branch object. A function that takes repo and branch to pretend as if it can work with an inconsistent pair is an invidation for the interface misuse, but I do not think it is a misuse of the API for a codepath that obtained a branch instance inspecting what repository it came from. A function that takes a pair of remote and a branch object for whatever reason may want to have an assert() that makes sure they came from the same repository, by the way. Also, some of the functions involved in this topic may need to say "I have this string, please resolve it as an object name in the context of this repository", and call a function that takes a repository as "context". The repository they use must be given by their caller somehow---is there any valid case where it must be different from the other parameter (i.e. contained object) they are dealing with? I do not think so. So such a function would have to say "resolve this string in the context of the repository branch->repo" because the function was given an in-core branch instance. > * Extra memory consumption. This is true. It however is the only valid excuse I can fully agree with for being hesitant to have "where did I come from" information. We do grow each in-core branch and remote instance by an 8-byte pointer (but we reduce the stackframe consumption by 8-byte along each step in the callchain as we do not have to pass a repository as a separate pointer parameter). > objects). I do not think we are in the same position with struct branch; > struct branch seems sufficiently separated to me. Sufficiently separated in the sense that you take a branch from the parent repository and throwing it at a function with a pointer to an in-core repository for a submodule would make some sort of sense? Sorry, but I do not follow this part of your argument, at all.
Junio C Hamano <gitster@pobox.com> writes: > IOW, we need to start somewhere to clean things up, so either we > teach multi-repository support to remote.c from day one, or we pass > a repository as a separate and redundant parameter, with the > intention to later clean up. I authored this topic with the assumption that passing a 'redundant' context object was the path forward, not something to clean up. I think this explains most of the divergence in opinion. > ...I suspect that this piece is isolated > enough that it is simpler to use as a starting point, and then the > callers into remote.c API can then be cleaned up, gradually extending > the circle. Agree that this piece is a good place to start such work. > I do not think it is meant to be internal to begin with. If we > declare, after careful consideration, that an in-core branch object > given by remote.c API always belongs to one and only one repository, > then following the branch->repository pointer should be the > documented and kosher way to find out the repository by anybody who > is given the branch object. In the specific case of an in-core object that can refer only to one repository, I think this is a fairly straightforward way of maintaining referential integrity. However I don't think that the approach of "contained object pointing to container" generalizes well - if I were passed a struct remote_state and I needed to access its repository later, is the ideal interface a backpointer, or is there a deeper problem with how things are structured? IOW, whether or not we should use the backpointer seems to depend on the specifics of each use case. It might be productive to decide on guidelines, especially with regards to repository. > ...A function that takes repo and branch > to pretend as if it can work with an inconsistent pair is an > invidation for the interface misuse. Fair. >> objects). I do not think we are in the same position with struct branch; >> struct branch seems sufficiently separated to me. > > Sufficiently separated in the sense that you take a branch from the > parent repository and throwing it at a function with a pointer to an > in-core repository for a submodule would make some sort of sense? I meant "sufficiently separated" in the sense that a struct branch is meaningful to callers, even in the absence of a repository. Even in the case where we *might* care about the repository e.g. getting a default for remote_for_branch(), there are callers that make do with just branch->remote_name. This was meant to contrast to ref_store, which relies on its specific repository for much of its internals and is difficult to separate. I am worried about the knock-on effects of taking a _relatively_ clean abstraction layer in struct branch and tying it back to the repository layer. That said, you have rightly noted that we should not pretend that a branch from repo A can be thrown together with a pointer to repo B in any kind of meaningful way (or at least, not yet). A backpointer can prevent this. I'm not fully convinced either way so I'll take time to mull over it; I would also like to start on the right foot with this subsystem.