Message ID | ddbec7598664bceee50213a41fefa248d249435e.1615813673.git.ps@pks.im (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | rev-parse: implement object type filter | expand |
On Mon, Mar 15, 2021 at 02:14:36PM +0100, Patrick Steinhardt wrote: > The NOT_USER_GIVEN flag of an object marks whether a flag was explicitly > provided by the user or not. The most important use case for this is > when filtering objects: only objects that were not explicitly requested > will get filtered. > > The flag is currently only set for blobs and trees, which has been fine > given that there are no filters for tags or commits currently. We're > about to extend filtering capabilities to add object type filter though, > which requires us to set up the NOT_USER_GIVEN flag correctly -- if it's > not set, the object wouldn't get filtered at all. > > Mark unseen commit parents as NOT_USER_GIVEN when processing parents. > Like this, explicitly provided parents stay user-given and thus > unfiltered, while parents which get loaded as part of the graph walk > can be filtered. > > This commit shouldn't have any user-visible impact yet as there is no > logic to filter commits yet. I'm still scratching my head a bit to understand how NOT_USER_GIVEN can possibly be correct (as opposed to USER_GIVEN). If we visit the commit in a not-user-given context and add the flag, how do we know it wasn't _also_ visited in a user-given context? Just guessing, but perhaps the SEEN flag is saving us here? If we visit the user-given commit itself first, then we give it the SEEN flag. Then if we try to visit it again via parent traversal, we've already processed it and don't add the NOT_USER_GIVEN flag here. That seems the opposite of the order we'd usually traverse, but I think we set SEEN on each commit in prepare_revision_walk(), before we do any traversing. So I _think_ it all works even with your changes here, but I have to say this NOT_USER_GIVEN thing seems really fragile to me. Not new in your series, of course, but something we may want to look at. Just grepping around, "rev-list -g" will happily remove SEEN flags, so I suspect it interacts badly with --filter. Just trying "rev-list -g --objects --filter=object:type=blob HEAD" shows that it produces quite a lot of commits (which I think is a more fundamental problem: it is not walking the parent chain at all to assign these NOT_USER_GIVEN flags). -Peff
On Tue, Apr 06, 2021 at 01:30:57PM -0400, Jeff King wrote: > On Mon, Mar 15, 2021 at 02:14:36PM +0100, Patrick Steinhardt wrote: > > > The NOT_USER_GIVEN flag of an object marks whether a flag was explicitly > > provided by the user or not. The most important use case for this is > > when filtering objects: only objects that were not explicitly requested > > will get filtered. > > > > The flag is currently only set for blobs and trees, which has been fine > > given that there are no filters for tags or commits currently. We're > > about to extend filtering capabilities to add object type filter though, > > which requires us to set up the NOT_USER_GIVEN flag correctly -- if it's > > not set, the object wouldn't get filtered at all. > > > > Mark unseen commit parents as NOT_USER_GIVEN when processing parents. > > Like this, explicitly provided parents stay user-given and thus > > unfiltered, while parents which get loaded as part of the graph walk > > can be filtered. > > > > This commit shouldn't have any user-visible impact yet as there is no > > logic to filter commits yet. > > I'm still scratching my head a bit to understand how NOT_USER_GIVEN can > possibly be correct (as opposed to USER_GIVEN). If we visit the commit > in a not-user-given context and add the flag, how do we know it wasn't > _also_ visited in a user-given context? > > Just guessing, but perhaps the SEEN flag is saving us here? If we visit > the user-given commit itself first, then we give it the SEEN flag. Then > if we try to visit it again via parent traversal, we've already > processed it and don't add the NOT_USER_GIVEN flag here. Yes, I think that's mostly it. > That seems the opposite of the order we'd usually traverse, but I think > we set SEEN on each commit in prepare_revision_walk(), before we do any > traversing. > > So I _think_ it all works even with your changes here, but I have to say > this NOT_USER_GIVEN thing seems really fragile to me. Not new in your > series, of course, but something we may want to look at. > > Just grepping around, "rev-list -g" will happily remove SEEN flags, so I > suspect it interacts badly with --filter. Just trying "rev-list -g > --objects --filter=object:type=blob HEAD" shows that it produces quite a > lot of commits (which I think is a more fundamental problem: it is not > walking the parent chain at all to assign these NOT_USER_GIVEN flags). I totally agree that this feels fragile, and developing this series with NOT_USER_GIVEN wasn't the most enjoyable experience either. I wouldn't love to be doing the conversion back to USER_GIVEN as part of this series, but I wouldn't oppose doing that job either. Right now I don't feel like I'm sufficiently sure that it's working for all cases, and indeed your example with "rev-list -g" already shows one case where it's breaking. So let me know whether I should add the conversion as preparatory step. Patrick
diff --git a/revision.c b/revision.c index b78733f508..26f422f50d 100644 --- a/revision.c +++ b/revision.c @@ -1123,7 +1123,7 @@ static int process_parents(struct rev_info *revs, struct commit *commit, mark_parents_uninteresting(p); if (p->object.flags & SEEN) continue; - p->object.flags |= SEEN; + p->object.flags |= (SEEN | NOT_USER_GIVEN); if (list) commit_list_insert_by_date(p, list); if (queue) @@ -1165,7 +1165,7 @@ static int process_parents(struct rev_info *revs, struct commit *commit, } p->object.flags |= left_flag; if (!(p->object.flags & SEEN)) { - p->object.flags |= SEEN; + p->object.flags |= (SEEN | NOT_USER_GIVEN); if (list) commit_list_insert_by_date(p, list); if (queue) diff --git a/revision.h b/revision.h index e6be3c845e..f1f324a19b 100644 --- a/revision.h +++ b/revision.h @@ -44,9 +44,6 @@ /* * Indicates object was reached by traversal. i.e. not given by user on * command-line or stdin. - * NEEDSWORK: NOT_USER_GIVEN doesn't apply to commits because we only support - * filtering trees and blobs, but it may be useful to support filtering commits - * in the future. */ #define NOT_USER_GIVEN (1u<<25) #define TRACK_LINEAR (1u<<26)
The NOT_USER_GIVEN flag of an object marks whether a flag was explicitly provided by the user or not. The most important use case for this is when filtering objects: only objects that were not explicitly requested will get filtered. The flag is currently only set for blobs and trees, which has been fine given that there are no filters for tags or commits currently. We're about to extend filtering capabilities to add object type filter though, which requires us to set up the NOT_USER_GIVEN flag correctly -- if it's not set, the object wouldn't get filtered at all. Mark unseen commit parents as NOT_USER_GIVEN when processing parents. Like this, explicitly provided parents stay user-given and thus unfiltered, while parents which get loaded as part of the graph walk can be filtered. This commit shouldn't have any user-visible impact yet as there is no logic to filter commits yet. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- revision.c | 4 ++-- revision.h | 3 --- 2 files changed, 2 insertions(+), 5 deletions(-)