diff mbox series

[RFC] object-file.c: batched disk flushes for stream_loose_object()

Message ID 20220609030530.51746-1-chiyutianyi@gmail.com (mailing list archive)
State New, archived
Headers show
Series [RFC] object-file.c: batched disk flushes for stream_loose_object() | expand

Commit Message

Han Xin June 9, 2022, 3:05 a.m. UTC
Neeraj Singh[1] pointed out that if batch fsync is enabled, we should still
call prepare_loose_object_bulk_checkin() to potentially create the bulk checkin
objdir.

1. https://lore.kernel.org/git/7ba4858a-d1cc-a4eb-b6d6-4c04a5dd6ce7@gmail.com/

Signed-off-by: Han Xin <chiyutianyi@gmail.com>
---
 object-file.c                   |  3 +++
 t/t5351-unpack-large-objects.sh | 15 ++++++++++++++-
 2 files changed, 17 insertions(+), 1 deletion(-)

Comments

Neeraj Singh June 9, 2022, 7:35 a.m. UTC | #1
On 6/8/2022 8:05 PM, Han Xin wrote:
> Neeraj Singh[1] pointed out that if batch fsync is enabled, we should still
> call prepare_loose_object_bulk_checkin() to potentially create the bulk checkin
> objdir.
> 
> 1. https://lore.kernel.org/git/7ba4858a-d1cc-a4eb-b6d6-4c04a5dd6ce7@gmail.com/
> 
> Signed-off-by: Han Xin <chiyutianyi@gmail.com>
> ---
>   object-file.c                   |  3 +++
>   t/t5351-unpack-large-objects.sh | 15 ++++++++++++++-
>   2 files changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/object-file.c b/object-file.c
> index 2dd828b45b..3a1be74775 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -2131,6 +2131,9 @@ int stream_loose_object(struct input_stream *in_stream, size_t len,
>   	char hdr[MAX_HEADER_LEN];
>   	int hdrlen;
>   
> +	if (batch_fsync_enabled(FSYNC_COMPONENT_LOOSE_OBJECT))
> +		prepare_loose_object_bulk_checkin();
> +
>   	/* Since oid is not determined, save tmp file to odb path. */
>   	strbuf_addf(&filename, "%s/", get_object_directory());
>   	hdrlen = format_object_header(hdr, sizeof(hdr), OBJ_BLOB, len);
> diff --git a/t/t5351-unpack-large-objects.sh b/t/t5351-unpack-large-objects.sh
> index 461ca060b2..a66a51f7df 100755
> --- a/t/t5351-unpack-large-objects.sh
> +++ b/t/t5351-unpack-large-objects.sh
> @@ -18,7 +18,10 @@ test_expect_success "create large objects (1.5 MB) and PACK" '
>   	test_commit --append foo big-blob &&
>   	test-tool genrandom bar 1500000 >big-blob &&
>   	test_commit --append bar big-blob &&
> -	PACK=$(echo HEAD | git pack-objects --revs pack)
> +	PACK=$(echo HEAD | git pack-objects --revs pack) &&
> +	git verify-pack -v pack-$PACK.pack |
> +	    grep -E "commit|tree|blob" |
> +		sed -n -e "s/^\([0-9a-f]*\).*/\1/p" >obj-list
>   '
>   
>   test_expect_success 'set memory limitation to 1MB' '
> @@ -45,6 +48,16 @@ test_expect_success 'unpack big object in stream' '
>   	test_dir_is_empty dest.git/objects/pack
>   '
>   
> +BATCH_CONFIGURATION='-c core.fsync=loose-object -c core.fsyncmethod=batch'
> +
> +test_expect_success 'unpack big object in stream (core.fsyncmethod=batch)' '
> +	prepare_dest 1m &&
> +	git $BATCH_CONFIGURATION -C dest.git unpack-objects <pack-$PACK.pack &&
> +	test_dir_is_empty dest.git/objects/pack &&
> +	git -C dest.git cat-file --batch-check="%(objectname)" <obj-list >current &&
> +	cmp obj-list current
> +'
> +
>   test_expect_success 'do not unpack existing large objects' '
>   	prepare_dest 1m &&
>   	git -C dest.git index-pack --stdin <pack-$PACK.pack &&

This fix looks good to me.

Thanks.

-Neeraj
Johannes Schindelin June 9, 2022, 9:30 a.m. UTC | #2
Hi,

On Thu, 9 Jun 2022, Han Xin wrote:

> Neeraj Singh[1] pointed out that if batch fsync is enabled, we should still
> call prepare_loose_object_bulk_checkin() to potentially create the bulk checkin
> objdir.
>
> 1. https://lore.kernel.org/git/7ba4858a-d1cc-a4eb-b6d6-4c04a5dd6ce7@gmail.com/
>
> Signed-off-by: Han Xin <chiyutianyi@gmail.com>

I like a good commit message that is concise and yet has all the necessary
information. Well done!

> ---
>  object-file.c                   |  3 +++
>  t/t5351-unpack-large-objects.sh | 15 ++++++++++++++-
>  2 files changed, 17 insertions(+), 1 deletion(-)
>
> diff --git a/object-file.c b/object-file.c
> index 2dd828b45b..3a1be74775 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -2131,6 +2131,9 @@ int stream_loose_object(struct input_stream *in_stream, size_t len,
>  	char hdr[MAX_HEADER_LEN];
>  	int hdrlen;
>
> +	if (batch_fsync_enabled(FSYNC_COMPONENT_LOOSE_OBJECT))
> +		prepare_loose_object_bulk_checkin();
> +

Makes sense.

>  	/* Since oid is not determined, save tmp file to odb path. */
>  	strbuf_addf(&filename, "%s/", get_object_directory());
>  	hdrlen = format_object_header(hdr, sizeof(hdr), OBJ_BLOB, len);
> diff --git a/t/t5351-unpack-large-objects.sh b/t/t5351-unpack-large-objects.sh
> index 461ca060b2..a66a51f7df 100755
> --- a/t/t5351-unpack-large-objects.sh
> +++ b/t/t5351-unpack-large-objects.sh
> @@ -18,7 +18,10 @@ test_expect_success "create large objects (1.5 MB) and PACK" '
>  	test_commit --append foo big-blob &&
>  	test-tool genrandom bar 1500000 >big-blob &&
>  	test_commit --append bar big-blob &&
> -	PACK=$(echo HEAD | git pack-objects --revs pack)
> +	PACK=$(echo HEAD | git pack-objects --revs pack) &&
> +	git verify-pack -v pack-$PACK.pack |
> +	    grep -E "commit|tree|blob" |
> +		sed -n -e "s/^\([0-9a-f]*\).*/\1/p" >obj-list

Here, I would recommend avoiding the pipe, to ensure that we would catch
problems in the `verify-pack` invocation, and I think we can avoid the
`grep` altogether:

	git verify-pack -v pack-$PACK.pack >out &&
	sed -n 's/^\([0-9a-f][0-9a-f]*\).*\(commit\|tree\|blob\)/\1/p' \
		<out >obj-list

>  '
>
>  test_expect_success 'set memory limitation to 1MB' '
> @@ -45,6 +48,16 @@ test_expect_success 'unpack big object in stream' '
>  	test_dir_is_empty dest.git/objects/pack
>  '
>
> +BATCH_CONFIGURATION='-c core.fsync=loose-object -c core.fsyncmethod=batch'
> +
> +test_expect_success 'unpack big object in stream (core.fsyncmethod=batch)' '
> +	prepare_dest 1m &&
> +	git $BATCH_CONFIGURATION -C dest.git unpack-objects <pack-$PACK.pack &&

I think the canonical way would be to use `test_config core.fsync ...`,
but the presented way works, too.

> +	test_dir_is_empty dest.git/objects/pack &&
> +	git -C dest.git cat-file --batch-check="%(objectname)" <obj-list >current &&

Good. The `--batch-check="%(objectname)"` part forces `cat-file` to read
the actual object.

> +	cmp obj-list current
> +'

My main question about this test case is whether it _actually_ verifies
that the batch-mode `fsync()`ing took place.

I kind of had expected to see Trace2 enabled and a `grep` for
`fsync/hardware-flush`. Do you think that would still make sense to add?

Thank you for working on the `fsync()` aspects of Git!
Dscho

> +
>  test_expect_success 'do not unpack existing large objects' '
>  	prepare_dest 1m &&
>  	git -C dest.git index-pack --stdin <pack-$PACK.pack &&
> --
> 2.36.1
>
>
Han Xin June 10, 2022, 12:55 p.m. UTC | #3
On Thu, Jun 9, 2022 at 5:30 PM Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>
> Hi,
>
> On Thu, 9 Jun 2022, Han Xin wrote:
>
> > Neeraj Singh[1] pointed out that if batch fsync is enabled, we should still
> > call prepare_loose_object_bulk_checkin() to potentially create the bulk checkin
> > objdir.
> >
> > 1. https://lore.kernel.org/git/7ba4858a-d1cc-a4eb-b6d6-4c04a5dd6ce7@gmail.com/
> >
> > Signed-off-by: Han Xin <chiyutianyi@gmail.com>
>
> I like a good commit message that is concise and yet has all the necessary
> information. Well done!
>
> > ---
> >  object-file.c                   |  3 +++
> >  t/t5351-unpack-large-objects.sh | 15 ++++++++++++++-
> >  2 files changed, 17 insertions(+), 1 deletion(-)
> >
> > diff --git a/object-file.c b/object-file.c
> > index 2dd828b45b..3a1be74775 100644
> > --- a/object-file.c
> > +++ b/object-file.c
> > @@ -2131,6 +2131,9 @@ int stream_loose_object(struct input_stream *in_stream, size_t len,
> >       char hdr[MAX_HEADER_LEN];
> >       int hdrlen;
> >
> > +     if (batch_fsync_enabled(FSYNC_COMPONENT_LOOSE_OBJECT))
> > +             prepare_loose_object_bulk_checkin();
> > +
>
> Makes sense.
>
> >       /* Since oid is not determined, save tmp file to odb path. */
> >       strbuf_addf(&filename, "%s/", get_object_directory());
> >       hdrlen = format_object_header(hdr, sizeof(hdr), OBJ_BLOB, len);
> > diff --git a/t/t5351-unpack-large-objects.sh b/t/t5351-unpack-large-objects.sh
> > index 461ca060b2..a66a51f7df 100755
> > --- a/t/t5351-unpack-large-objects.sh
> > +++ b/t/t5351-unpack-large-objects.sh
> > @@ -18,7 +18,10 @@ test_expect_success "create large objects (1.5 MB) and PACK" '
> >       test_commit --append foo big-blob &&
> >       test-tool genrandom bar 1500000 >big-blob &&
> >       test_commit --append bar big-blob &&
> > -     PACK=$(echo HEAD | git pack-objects --revs pack)
> > +     PACK=$(echo HEAD | git pack-objects --revs pack) &&
> > +     git verify-pack -v pack-$PACK.pack |
> > +         grep -E "commit|tree|blob" |
> > +             sed -n -e "s/^\([0-9a-f]*\).*/\1/p" >obj-list
>
> Here, I would recommend avoiding the pipe, to ensure that we would catch
> problems in the `verify-pack` invocation, and I think we can avoid the
> `grep` altogether:
>
>         git verify-pack -v pack-$PACK.pack >out &&
>         sed -n 's/^\([0-9a-f][0-9a-f]*\).*\(commit\|tree\|blob\)/\1/p' \
>                 <out >obj-list
>

Good suggestion. I will take it.

Thanks.
-Han Xin

> >  '
> >
> >  test_expect_success 'set memory limitation to 1MB' '
> > @@ -45,6 +48,16 @@ test_expect_success 'unpack big object in stream' '
> >       test_dir_is_empty dest.git/objects/pack
> >  '
> >
> > +BATCH_CONFIGURATION='-c core.fsync=loose-object -c core.fsyncmethod=batch'
> > +
> > +test_expect_success 'unpack big object in stream (core.fsyncmethod=batch)' '
> > +     prepare_dest 1m &&
> > +     git $BATCH_CONFIGURATION -C dest.git unpack-objects <pack-$PACK.pack &&
>
> I think the canonical way would be to use `test_config core.fsync ...`,
> but the presented way works, too.
>
> > +     test_dir_is_empty dest.git/objects/pack &&
> > +     git -C dest.git cat-file --batch-check="%(objectname)" <obj-list >current &&
>
> Good. The `--batch-check="%(objectname)"` part forces `cat-file` to read
> the actual object.
>
> > +     cmp obj-list current
> > +'
>
> My main question about this test case is whether it _actually_ verifies
> that the batch-mode `fsync()`ing took place.
>
> I kind of had expected to see Trace2 enabled and a `grep` for
> `fsync/hardware-flush`. Do you think that would still make sense to add?
>
> Thank you for working on the `fsync()` aspects of Git!
> Dscho
>

More rigorous inspection should be adopted.

Thanks.
-Han Xin

> > +
> >  test_expect_success 'do not unpack existing large objects' '
> >       prepare_dest 1m &&
> >       git -C dest.git index-pack --stdin <pack-$PACK.pack &&
> > --
> > 2.36.1
> >
> >
diff mbox series

Patch

diff --git a/object-file.c b/object-file.c
index 2dd828b45b..3a1be74775 100644
--- a/object-file.c
+++ b/object-file.c
@@ -2131,6 +2131,9 @@  int stream_loose_object(struct input_stream *in_stream, size_t len,
 	char hdr[MAX_HEADER_LEN];
 	int hdrlen;
 
+	if (batch_fsync_enabled(FSYNC_COMPONENT_LOOSE_OBJECT))
+		prepare_loose_object_bulk_checkin();
+
 	/* Since oid is not determined, save tmp file to odb path. */
 	strbuf_addf(&filename, "%s/", get_object_directory());
 	hdrlen = format_object_header(hdr, sizeof(hdr), OBJ_BLOB, len);
diff --git a/t/t5351-unpack-large-objects.sh b/t/t5351-unpack-large-objects.sh
index 461ca060b2..a66a51f7df 100755
--- a/t/t5351-unpack-large-objects.sh
+++ b/t/t5351-unpack-large-objects.sh
@@ -18,7 +18,10 @@  test_expect_success "create large objects (1.5 MB) and PACK" '
 	test_commit --append foo big-blob &&
 	test-tool genrandom bar 1500000 >big-blob &&
 	test_commit --append bar big-blob &&
-	PACK=$(echo HEAD | git pack-objects --revs pack)
+	PACK=$(echo HEAD | git pack-objects --revs pack) &&
+	git verify-pack -v pack-$PACK.pack |
+	    grep -E "commit|tree|blob" |
+		sed -n -e "s/^\([0-9a-f]*\).*/\1/p" >obj-list
 '
 
 test_expect_success 'set memory limitation to 1MB' '
@@ -45,6 +48,16 @@  test_expect_success 'unpack big object in stream' '
 	test_dir_is_empty dest.git/objects/pack
 '
 
+BATCH_CONFIGURATION='-c core.fsync=loose-object -c core.fsyncmethod=batch'
+
+test_expect_success 'unpack big object in stream (core.fsyncmethod=batch)' '
+	prepare_dest 1m &&
+	git $BATCH_CONFIGURATION -C dest.git unpack-objects <pack-$PACK.pack &&
+	test_dir_is_empty dest.git/objects/pack &&
+	git -C dest.git cat-file --batch-check="%(objectname)" <obj-list >current &&
+	cmp obj-list current
+'
+
 test_expect_success 'do not unpack existing large objects' '
 	prepare_dest 1m &&
 	git -C dest.git index-pack --stdin <pack-$PACK.pack &&