diff mbox series

[v2,3/3] builtin/unpack-objects.c: change xwrite to write_in_full avoid truncation.

Message ID 20240227150934.7950-4-randall.becker@nexbridge.ca (mailing list archive)
State New, archived
Headers show
Series Change xwrite() to write_in_full() in builtins. | expand

Commit Message

Randall S. Becker Feb. 27, 2024, 3:09 p.m. UTC
From: "Randall S. Becker" <rsbecker@nexbridge.com>

This change is required because some platforms do not support file writes of
arbitrary sizes (e.g, NonStop). xwrite ends up truncating the output to the
maximum single I/O size possible for the destination device if the supplied
len value exceeds the supported value. Replacing xwrite with write_in_full
corrects this problem. Future optimisations could remove the loop in favour
of just calling write_in_full.

Signed-off-by: Randall S. Becker <rsbecker@nexbridge.com>
---
 builtin/unpack-objects.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Junio C Hamano Feb. 27, 2024, 6:58 p.m. UTC | #1
"Randall S. Becker" <the.n.e.key@gmail.com> writes:

> From: "Randall S. Becker" <rsbecker@nexbridge.com>
>
> This change is required because some platforms do not support file writes of
> arbitrary sizes (e.g, NonStop). xwrite ends up truncating the output to the
> maximum single I/O size possible for the destination device if the supplied
> len value exceeds the supported value. Replacing xwrite with write_in_full
> corrects this problem. Future optimisations could remove the loop in favour
> of just calling write_in_full.
>
> Signed-off-by: Randall S. Becker <rsbecker@nexbridge.com>
> ---
>  builtin/unpack-objects.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
> index e0a701f2b3..6935c4574e 100644
> --- a/builtin/unpack-objects.c
> +++ b/builtin/unpack-objects.c
> @@ -680,7 +680,7 @@ int cmd_unpack_objects(int argc, const char **argv, const char *prefix UNUSED)
>  
>  	/* Write the last part of the buffer to stdout */
>  	while (len) {
> -		int ret = xwrite(1, buffer + offset, len);
> +		int ret = write_in_full(1, buffer + offset, len);
>  		if (ret <= 0)
>  			break;
>  		len -= ret;

Why do we need this with a retry loop that is prepared for short
write(2) specifically like this?

If xwrite() calls underlying write(2) with too large a value, then
your MAX_IO_SIZE is misconfigured, and the fix should go there, not
here in a loop that expects a working xwrite() that is allowed to
return on short write(2), I would think.
Randall S. Becker Feb. 27, 2024, 7:04 p.m. UTC | #2
On Tuesday, February 27, 2024 1:59 PM, Junio C Hamano wrote:
>"Randall S. Becker" <the.n.e.key@gmail.com> writes:
>
>> From: "Randall S. Becker" <rsbecker@nexbridge.com>
>>
>> This change is required because some platforms do not support file
>> writes of arbitrary sizes (e.g, NonStop). xwrite ends up truncating
>> the output to the maximum single I/O size possible for the destination
>> device if the supplied len value exceeds the supported value.
>> Replacing xwrite with write_in_full corrects this problem. Future
>> optimisations could remove the loop in favour of just calling
write_in_full.
>>
>> Signed-off-by: Randall S. Becker <rsbecker@nexbridge.com>
>> ---
>>  builtin/unpack-objects.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c index
>> e0a701f2b3..6935c4574e 100644
>> --- a/builtin/unpack-objects.c
>> +++ b/builtin/unpack-objects.c
>> @@ -680,7 +680,7 @@ int cmd_unpack_objects(int argc, const char
>> **argv, const char *prefix UNUSED)
>>
>>  	/* Write the last part of the buffer to stdout */
>>  	while (len) {
>> -		int ret = xwrite(1, buffer + offset, len);
>> +		int ret = write_in_full(1, buffer + offset, len);
>>  		if (ret <= 0)
>>  			break;
>>  		len -= ret;
>
>Why do we need this with a retry loop that is prepared for short
>write(2) specifically like this?
>
>If xwrite() calls underlying write(2) with too large a value, then your
MAX_IO_SIZE is misconfigured, and the fix should go there, not
>here in a loop that expects a working xwrite() that is allowed to return on
short write(2), I would think.

I experimented with using write_in_full vs. keeping xwrite. With xwrite in
this loop, t7704.9 consistently fails as described in the other thread. With
write_in_full, the code works correctly. I assume there are side-effects
that are present. This change is critical to having the code work on
NonStop. Otherwise git seems to be at risk of actually being seriously
broken if unpack does not work correctly. I am happy to have my series
ignored as long as the problem is otherwise corrected.
Jeff King Feb. 27, 2024, 7:25 p.m. UTC | #3
On Tue, Feb 27, 2024 at 02:04:46PM -0500, rsbecker@nexbridge.com wrote:

> >> diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c index
> >> e0a701f2b3..6935c4574e 100644
> >> --- a/builtin/unpack-objects.c
> >> +++ b/builtin/unpack-objects.c
> >> @@ -680,7 +680,7 @@ int cmd_unpack_objects(int argc, const char
> >> **argv, const char *prefix UNUSED)
> >>
> >>  	/* Write the last part of the buffer to stdout */
> >>  	while (len) {
> >> -		int ret = xwrite(1, buffer + offset, len);
> >> +		int ret = write_in_full(1, buffer + offset, len);
> >>  		if (ret <= 0)
> >>  			break;
> >>  		len -= ret;
> [...]
> I experimented with using write_in_full vs. keeping xwrite. With xwrite in
> this loop, t7704.9 consistently fails as described in the other thread. With
> write_in_full, the code works correctly. I assume there are side-effects
> that are present. This change is critical to having the code work on
> NonStop. Otherwise git seems to be at risk of actually being seriously
> broken if unpack does not work correctly. I am happy to have my series
> ignored as long as the problem is otherwise corrected.

I'm somewhat skeptical that this code is to blame, as it should be run
very rarely at all; it is just dumping any content in the pack stream
after the end of the checksum to stdout. But in normal use by Git, there
is no such content in the first place.

If I do this:

diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
index e0a701f2b3..affe55035d 100644
--- a/builtin/unpack-objects.c
+++ b/builtin/unpack-objects.c
@@ -680,11 +680,7 @@ int cmd_unpack_objects(int argc, const char **argv, const char *prefix UNUSED)
 
 	/* Write the last part of the buffer to stdout */
 	while (len) {
-		int ret = xwrite(1, buffer + offset, len);
-		if (ret <= 0)
-			break;
-		len -= ret;
-		offset += ret;
+		BUG("cruft at the end of the pack!");
 	}
 
 	/* All done */

then t7704 still passes, as it does not run this code at all. In fact,
nothing in the test suite fails. Which is not to say we should get rid
of those code. If we were writing today we might flag it as an error,
but we should keep it for historical compatibility.

But I do not see any bug in the code, and nor do I think it could
contribute to a test failure.

-Peff
Randall S. Becker Feb. 27, 2024, 9:05 p.m. UTC | #4
On Tuesday, February 27, 2024 2:26 PM, Peff wrote:
>To: rsbecker@nexbridge.com
>Cc: 'Junio C Hamano' <gitster@pobox.com>; 'Randall S. Becker' <the.n.e.key@gmail.com>; git@vger.kernel.org
>Subject: Re: [PATCH v2 3/3] builtin/unpack-objects.c: change xwrite to write_in_full avoid truncation.
>
>On Tue, Feb 27, 2024 at 02:04:46PM -0500, rsbecker@nexbridge.com wrote:
>
>> >> diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
>> >> index e0a701f2b3..6935c4574e 100644
>> >> --- a/builtin/unpack-objects.c
>> >> +++ b/builtin/unpack-objects.c
>> >> @@ -680,7 +680,7 @@ int cmd_unpack_objects(int argc, const char
>> >> **argv, const char *prefix UNUSED)
>> >>
>> >>  	/* Write the last part of the buffer to stdout */
>> >>  	while (len) {
>> >> -		int ret = xwrite(1, buffer + offset, len);
>> >> +		int ret = write_in_full(1, buffer + offset, len);
>> >>  		if (ret <= 0)
>> >>  			break;
>> >>  		len -= ret;
>> [...]
>> I experimented with using write_in_full vs. keeping xwrite. With
>> xwrite in this loop, t7704.9 consistently fails as described in the
>> other thread. With write_in_full, the code works correctly. I assume
>> there are side-effects that are present. This change is critical to
>> having the code work on NonStop. Otherwise git seems to be at risk of
>> actually being seriously broken if unpack does not work correctly. I
>> am happy to have my series ignored as long as the problem is otherwise corrected.
>
>I'm somewhat skeptical that this code is to blame, as it should be run very rarely at all; it is just dumping any content in the pack stream
>after the end of the checksum to stdout. But in normal use by Git, there is no such content in the first place.
>
>If I do this:
>
>diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c index e0a701f2b3..affe55035d 100644
>--- a/builtin/unpack-objects.c
>+++ b/builtin/unpack-objects.c
>@@ -680,11 +680,7 @@ int cmd_unpack_objects(int argc, const char **argv, const char *prefix UNUSED)
>
> 	/* Write the last part of the buffer to stdout */
> 	while (len) {
>-		int ret = xwrite(1, buffer + offset, len);
>-		if (ret <= 0)
>-			break;
>-		len -= ret;
>-		offset += ret;
>+		BUG("cruft at the end of the pack!");
> 	}
>
> 	/* All done */
>
>then t7704 still passes, as it does not run this code at all. In fact, nothing in the test suite fails. Which is not to say we should get rid of
>those code. If we were writing today we might flag it as an error, but we should keep it for historical compatibility.
>
>But I do not see any bug in the code, and nor do I think it could contribute to a test failure.

I have obviously gone down the wrong path trying to resolve this situation. Please consider this entire series dropped with my apologies for the time-waste.

Unfortunately, I do not have sufficient knowledge of the code to resolve the originally reported problem without further assistance to determine the root case (assuming it still is a problem). Changes in master post-2.44.0 appear to have contributed to resolving the situation, so I am now getting random pass/fail on the test. I'm going to hold 2.44.0 on ia64 and wait for a subsequent release at retest at that time.

Sadly,
--Randall
Jeff King March 7, 2024, 10 a.m. UTC | #5
On Tue, Feb 27, 2024 at 04:05:53PM -0500, rsbecker@nexbridge.com wrote:

> Unfortunately, I do not have sufficient knowledge of the code to
> resolve the originally reported problem without further assistance to
> determine the root case (assuming it still is a problem). Changes in
> master post-2.44.0 appear to have contributed to resolving the
> situation, so I am now getting random pass/fail on the test. I'm going
> to hold 2.44.0 on ia64 and wait for a subsequent release at retest at
> that time.

If you're getting random pass/fail (which does seem like the kind of
thing that could be related to pipe write() sizes), you might try using
the "--stress" argument. That can give you more consistent results while
bisecting (e.g., if "--stress" runs successfully for a few minutes).

That said, given the failing test you mentioned, I kind of assume that
it was not a code change that caused the problem, but rather a new test
exercising new code that happens to tickle your race.

-Peff
diff mbox series

Patch

diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
index e0a701f2b3..6935c4574e 100644
--- a/builtin/unpack-objects.c
+++ b/builtin/unpack-objects.c
@@ -680,7 +680,7 @@  int cmd_unpack_objects(int argc, const char **argv, const char *prefix UNUSED)
 
 	/* Write the last part of the buffer to stdout */
 	while (len) {
-		int ret = xwrite(1, buffer + offset, len);
+		int ret = write_in_full(1, buffer + offset, len);
 		if (ret <= 0)
 			break;
 		len -= ret;