mbox series

[v3,0/2] pack-bitmap.c: avoid exposing absolute paths

Message ID cover.1667470481.git.dyroneteng@gmail.com (mailing list archive)
Headers show
Series pack-bitmap.c: avoid exposing absolute paths | expand

Message

Teng Long Nov. 4, 2022, 3:17 a.m. UTC
From: Teng Long <dyroneteng@gmail.com>

Diff since v2:

1. trace2 log will include the path info of pack. not only "basename".
2. A new commit which removes unnecessary "open_pack_index()" calls in
"open_midx_bitmap_1()".
3. The patch v2 still base on 2.37, rebase on 'master' to prevent conflicts.

Thanks.

Teng Long (2):
  pack-bitmap.c: avoid exposing absolute paths
  pack-bitmap.c: remove unnecessary "open_pack_index()" calls

 pack-bitmap.c           | 15 ++++++++-------
 t/t5310-pack-bitmaps.sh |  5 +++--
 2 files changed, 11 insertions(+), 9 deletions(-)

Range-diff against v2:
1:  1591e7ee52 < -:  ---------- pack-bitmap.c: avoid exposing absolute paths
-:  ---------- > 1:  de941f58f9 pack-bitmap.c: avoid exposing absolute paths
-:  ---------- > 2:  7ac9b859f3 pack-bitmap.c: remove unnecessary "open_pack_index()" calls

Comments

Taylor Blau Nov. 4, 2022, 10:13 p.m. UTC | #1
On Fri, Nov 04, 2022 at 11:17:08AM +0800, Teng Long wrote:
> From: Teng Long <dyroneteng@gmail.com>
>
> Diff since v2:
>
> 1. trace2 log will include the path info of pack. not only "basename".
> 2. A new commit which removes unnecessary "open_pack_index()" calls in
> "open_midx_bitmap_1()".
> 3. The patch v2 still base on 2.37, rebase on 'master' to prevent conflicts.
>
> Thanks.

This version is looking pretty good, and I very much appreciate the
rebase onto 'master' in the meantime.

I left a couple of notes throughout which I think merit a small reroll.
Otherwise, this is looking just about ready to go.


Thanks,
Taylor
Teng Long Nov. 10, 2022, 7:10 a.m. UTC | #2
From: Teng Long <dyroneteng@gmail.com>

Diff since v2:

* remove unnecessary comments.
* use "GIT_TRACE2_EVENT" instead of "GIT_TRACE_PERF".
* improve commit message of [1/2].

Thanks.

Teng Long (2):
  pack-bitmap.c: remove unnecessary "open_pack_index()" calls
  pack-bitmap.c: avoid exposing absolute paths

 pack-bitmap.c           | 13 ++++++-------
 t/t5310-pack-bitmaps.sh |  5 +++--
 2 files changed, 9 insertions(+), 9 deletions(-)

Range-diff against v2:
2:  7ac9b859f3 ! 1:  aaeb021538 pack-bitmap.c: remove unnecessary "open_pack_index()" calls
    @@ Metadata
      ## Commit message ##
         pack-bitmap.c: remove unnecessary "open_pack_index()" calls
     
    -    Everytime when calling "open_pack_bitmap_1()", we will firstly call
    -    "open_pack_index(packfile)" to check the index, then further check
    -    again in "is_pack_valid()" before mmap the bitmap file. So, let's
    -    remove the first check here.
    +    When trying to open a pack bitmap, we call open_pack_bitmap_1() in a
    +    loop, during which it tries to open up the pack index corresponding
    +    with each available pack.
     
    -    The relate discussion:
    -    https://public-inbox.org/git/Y2IiSU1L+bJPUioV@coredump.intra.peff.net/#t
    +    It's likely that we'll end up relying on objects in that pack later
    +    in the process (in which case we're doing the work of opening the
    +    pack index optimistically), but not guaranteed.
    +
    +    For instance, consider a repository with a large number of small
    +    packs, and one large pack with a bitmap. If we see that bitmap pack
    +    last in our loop which calls open_pack_bitmap_1(), the current code
    +    will have opened *all* pack index files in the repository. If the
    +    request can be served out of the bitmapped pack alone, then the time
    +    spent opening these idx files was wasted.S
    +
    +    Since open_pack_bitmap_1() calls is_pack_valid() later on (which in
    +    turns calls open_pack_index() itself), we can just drop the earlier
    +    call altogether.
     
         Signed-off-by: Teng Long <dyroneteng@gmail.com>
     
1:  de941f58f9 ! 2:  9d5a491887 pack-bitmap.c: avoid exposing absolute paths
    @@ pack-bitmap.c: static int open_midx_bitmap_1(struct bitmap_index *bitmap_git,
      		get_midx_filename(&buf, midx->object_dir);
     -		/* ignore extra bitmap file; we can only handle one */
     -		warning(_("ignoring extra bitmap file: '%s'"), buf.buf);
    -+		/* ignore extra midx bitmap files; we can only handle one */
     +		trace2_data_string("bitmap", the_repository,
     +				   "ignoring extra midx bitmap file", buf.buf);
      		close(fd);
    @@ pack-bitmap.c: static int open_pack_bitmap_1(struct bitmap_index *bitmap_git, st
      	if (bitmap_git->pack || bitmap_git->midx) {
     -		/* ignore extra bitmap file; we can only handle one */
     -		warning(_("ignoring extra bitmap file: '%s'"), packfile->pack_name);
    -+		/* ignore extra bitmap files; we can only handle one */
     +		trace2_data_string("bitmap", the_repository,
     +				   "ignoring extra bitmap file", packfile->pack_name);
      		close(fd);
    @@ t/t5310-pack-bitmaps.sh: test_bitmap_cases () {
      
     -			git rev-list --use-bitmap-index HEAD 2>err &&
     -			grep "ignoring extra bitmap file" err
    -+			GIT_TRACE2_PERF=$(pwd)/trace2.txt git rev-list --use-bitmap-index HEAD &&
    ++			GIT_TRACE2_EVENT=$(pwd)/trace2.txt git rev-list --use-bitmap-index HEAD &&
     +			grep "opened bitmap" trace2.txt &&
     +			grep "ignoring extra bitmap" trace2.txt
      		)
Taylor Blau Nov. 11, 2022, 10:26 p.m. UTC | #3
On Thu, Nov 10, 2022 at 03:10:10PM +0800, Teng Long wrote:
> From: Teng Long <dyroneteng@gmail.com>
>
> Diff since v2:
>
> * remove unnecessary comments.
> * use "GIT_TRACE2_EVENT" instead of "GIT_TRACE_PERF".
> * improve commit message of [1/2].

Thanks, this version looks great. I'm very satisfied with where it ended
up, and I'm feeling comfortable with merging it down.

Thanks,
Taylor
Jeff King Nov. 14, 2022, 10:23 p.m. UTC | #4
On Fri, Nov 11, 2022 at 05:26:39PM -0500, Taylor Blau wrote:

> On Thu, Nov 10, 2022 at 03:10:10PM +0800, Teng Long wrote:
> > From: Teng Long <dyroneteng@gmail.com>
> >
> > Diff since v2:
> >
> > * remove unnecessary comments.
> > * use "GIT_TRACE2_EVENT" instead of "GIT_TRACE_PERF".
> > * improve commit message of [1/2].
> 
> Thanks, this version looks great. I'm very satisfied with where it ended
> up, and I'm feeling comfortable with merging it down.

Me too. If we wanted to go further, there are two obvious next steps.

One, we can break out of the bitmap loop early if we're not tracing:

diff --git a/pack-bitmap.c b/pack-bitmap.c
index aaa2d9a104..3b6c2f804a 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -527,8 +527,15 @@ static int open_pack_bitmap(struct repository *r,
 	assert(!bitmap_git->map);
 
 	for (p = get_all_packs(r); p; p = p->next) {
-		if (open_pack_bitmap_1(bitmap_git, p) == 0)
+		if (open_pack_bitmap_1(bitmap_git, p) == 0) {
 			ret = 0;
+			/*
+			 * The only reason to keep looking is to report
+			 * duplicates.
+			 */
+			if (!trace2_is_enabled())
+				break;
+		}
 	}
 
 	return ret;

I doubt this buys us much in practice. After patch 2, looking for extra
bitmaps is much cheaper. It's one open() call per pack (which will
return ENOENT normally) looking for a bitmap. And while it's only 2
lines of code, it does increase coupling of assumptions between the two
functions. So maybe not worth doing. I dunno.

And two, we could complain when there is both a midx and a pack bitmap,
like so:

diff --git a/pack-bitmap.c b/pack-bitmap.c
index 3b6c2f804a..44a80ed8f2 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -524,8 +524,6 @@ static int open_pack_bitmap(struct repository *r,
 	struct packed_git *p;
 	int ret = -1;
 
-	assert(!bitmap_git->map);
-
 	for (p = get_all_packs(r); p; p = p->next) {
 		if (open_pack_bitmap_1(bitmap_git, p) == 0) {
 			ret = 0;
@@ -559,11 +557,20 @@ static int open_midx_bitmap(struct repository *r,
 static int open_bitmap(struct repository *r,
 		       struct bitmap_index *bitmap_git)
 {
+	int found = 0;
+
 	assert(!bitmap_git->map);
 
-	if (!open_midx_bitmap(r, bitmap_git))
-		return 0;
-	return open_pack_bitmap(r, bitmap_git);
+	found = !open_midx_bitmap(r, bitmap_git);
+
+	/*
+	 * these will all be skipped if we opened a midx bitmap; but run it
+	 * anyway if tracing is enabled to report the duplicates
+	 */
+	if (!found || trace2_is_enabled())
+		found |= !open_pack_bitmap(r, bitmap_git);
+
+	return found ? 0 : -1;
 }
 
 struct bitmap_index *prepare_bitmap_git(struct repository *r)

But I'm not sure if that is even something we want. I know we retained
pack bitmaps as a quick recovery mechanism while test-deploying midx
bitmaps. OTOH, now that the feature is stable, I doubt anybody else
would ever do that.

It also suffers from the same coupling issues as the other.

So I don't know that either is worth pursuing (hence this message and
not fully prepared patches), but these are just the two leftover things
I noticed from the series, so I wanted to record them for posterity.

-Peff
Teng Long Nov. 17, 2022, 2:19 p.m. UTC | #5
> Me too. If we wanted to go further, there are two obvious next steps.
>
> One, we can break out of the bitmap loop early if we're not tracing:

> diff --git a/pack-bitmap.c b/pack-bitmap.c
> index aaa2d9a104..3b6c2f804a 100644
> --- a/pack-bitmap.c
> +++ b/pack-bitmap.c
> @@ -527,8 +527,15 @@ static int open_pack_bitmap(struct repository *r,
>  	assert(!bitmap_git->map);
>
>  	for (p = get_all_packs(r); p; p = p->next) {
> -		if (open_pack_bitmap_1(bitmap_git, p) == 0)
> +		if (open_pack_bitmap_1(bitmap_git, p) == 0) {
>  			ret = 0;
> +			/*
> +			 * The only reason to keep looking is to report
> +			 * duplicates.
> +			 */
> +			if (!trace2_is_enabled())
> +				break;
> +		}
>  	}
>
>  	return ret;
>
> I doubt this buys us much in practice. After patch 2, looking for extra
> bitmaps is much cheaper. It's one open() call per pack (which will
> return ENOENT normally) looking for a bitmap. And while it's only 2
> lines of code, it does increase coupling of assumptions between the two
> functions. So maybe not worth doing. I dunno.

I agree and I think it's reasonable.

So If I bring it into the patch how about the commit message:

Author: Jeff King <peff@peff.net>
Date:   Thu Nov 17 19:55:23 2022 +0800

    pack-bitmap.c: break out of the bitmap loop early if not tracing

    When we successfully open a bitmap, we will continue to try to open
    other packs, and when trace2 is enabled, we will report any subsequent
    bitmap ignored information in the log. So when we find that trace2 is
    not enabled, we can actually terminate the loop early.

    Signed-off-by: Jeff King <peff@peff.net>
    Signed-off-by: Teng Long <dyroneteng@gmail.com>

> And two, we could complain when there is both a midx and a pack bitmap,
> like so:
>
> diff --git a/pack-bitmap.c b/pack-bitmap.c
> index 3b6c2f804a..44a80ed8f2 100644
> --- a/pack-bitmap.c
> +++ b/pack-bitmap.c
> @@ -524,8 +524,6 @@ static int open_pack_bitmap(struct repository *r,
>  	struct packed_git *p;
>  	int ret = -1;
>
> -	assert(!bitmap_git->map);
> -
>  	for (p = get_all_packs(r); p; p = p->next) {
>  		if (open_pack_bitmap_1(bitmap_git, p) == 0) {
>  			ret = 0;
> @@ -559,11 +557,20 @@ static int open_midx_bitmap(struct repository *r,
>  static int open_bitmap(struct repository *r,
>  		       struct bitmap_index *bitmap_git)
>  {
> +	int found = 0;
> +
>  	assert(!bitmap_git->map);
>
> -	if (!open_midx_bitmap(r, bitmap_git))
> -		return 0;
> -	return open_pack_bitmap(r, bitmap_git);
> +	found = !open_midx_bitmap(r, bitmap_git);
> +
> +	/*
> +	 * these will all be skipped if we opened a midx bitmap; but run it
> +	 * anyway if tracing is enabled to report the duplicates
> +	 */
> +	if (!found || trace2_is_enabled())
> +		found |= !open_pack_bitmap(r, bitmap_git);
> +
> +	return found ? 0 : -1;
>  }
>
>  struct bitmap_index *prepare_bitmap_git(struct repository *r)
>
> But I'm not sure if that is even something we want. I know we retained
> pack bitmaps as a quick recovery mechanism while test-deploying midx
> bitmaps. OTOH, now that the feature is stable, I doubt anybody else
> would ever do that.
>
> It also suffers from the same coupling issues as the other.
>
> So I don't know that either is worth pursuing (hence this message and
> not fully prepared patches), but these are just the two leftover things
> I noticed from the series, so I wanted to record them for posterity.

Since this is an internal mechanism, and we are doing reminders in trace2, so
the diff looks good to me. How about the commit message if I need to take it:

Author: Jeff King <peff@peff.net>
Date:   Thu Nov 17 20:25:18 2022 +0800

    pack-bitmap.c: trace bitmap ignore logs when midx-bitmap is found

    We retained pack bitmaps as a quick recovery mechanism while
    test-deploying midx bitmaps. This is an internal mechanism, and we
    want to expose this rule externally through trace2 so that end users,
    repo-maintainers, and debuggers know what is happening in the process.

    Signed-off-by: Jeff King <peff@peff.net>
    Signed-off-by: Teng Long <dyroneteng@gmail.com>

Thank you Peff for providing such detailed suggestions for improvement.
Jeff King Nov. 17, 2022, 3:03 p.m. UTC | #6
On Thu, Nov 17, 2022 at 10:19:33PM +0800, Teng Long wrote:

> > I doubt this buys us much in practice. After patch 2, looking for extra
> > bitmaps is much cheaper. It's one open() call per pack (which will
> > return ENOENT normally) looking for a bitmap. And while it's only 2
> > lines of code, it does increase coupling of assumptions between the two
> > functions. So maybe not worth doing. I dunno.
> 
> I agree and I think it's reasonable.
> 
> So If I bring it into the patch how about the commit message:
> [...]

Both the commit messages you proposed look accurate to me.

I was a bit skeptical on the second one (warning about skipping the
pack bitmap when a midx is present) just because we really did do that
intentionally at one point. But then I remembered that these are no
longer producing warnings, but trace output. So in that sense, nobody
will really be bothered by them.

I do wonder if there are people who have trace2 on all the time (for
performance tracing, telemetry, etc) who would find these to be junk in
their logs. IMHO the trace2 mechanism is a bit coarse grained in that we
can only check "is it on". One of the nice things about the original
trace facility is that we could stuff this behind GIT_TRACE_BITMAPS
which would really only be turned on if somebody wanted to debug
bitmaps.

-Peff
Taylor Blau Nov. 17, 2022, 9:57 p.m. UTC | #7
On Thu, Nov 17, 2022 at 10:03:06AM -0500, Jeff King wrote:
> On Thu, Nov 17, 2022 at 10:19:33PM +0800, Teng Long wrote:
>
> > > I doubt this buys us much in practice. After patch 2, looking for extra
> > > bitmaps is much cheaper. It's one open() call per pack (which will
> > > return ENOENT normally) looking for a bitmap. And while it's only 2
> > > lines of code, it does increase coupling of assumptions between the two
> > > functions. So maybe not worth doing. I dunno.
> >
> > I agree and I think it's reasonable.
> >
> > So If I bring it into the patch how about the commit message:
> > [...]
>
> Both the commit messages you proposed look accurate to me.

Yep, ditto. Teng -- would you mind sending them as a short series to the
list so that I can pick them up? Otherwise, I can do it if you don't
have time or interest.

Thanks,
Taylor
Teng Long Nov. 21, 2022, 3:27 a.m. UTC | #8
Taylor Blau <me@ttaylorr.com> writes:

> Yep, ditto. Teng -- would you mind sending them as a short series to the
> list so that I can pick them up? Otherwise, I can do it if you don't
> have time or interest.

Sorry, will send today.

Thanks