diff mbox series

[1/2] report: safely update the result.xml file

Message ID 20230706204232.1577765-1-tytso@mit.edu (mailing list archive)
State New, archived
Headers show
Series [1/2] report: safely update the result.xml file | expand

Commit Message

Theodore Ts'o July 6, 2023, 8:42 p.m. UTC
After every single test, we rewrite result.xml from scratch.  This
ensures that the XML file is always in a valid, parseable state, even
if the check script is killed or the machine crashes in the middle of
a test.

If the test is being run in a Cloud VM as a "spot" (Amazon, Azure, or
GCE) or "preemptible" (Oracle) instance, it is possible that the VM
can be halted whenever the Cloud provider needs the capacity for
customers who are willing to pay full price.  ("Spot" instances can be
60% to 90% cheaper --- allowing the frugal kernel developer to get up
to 10 times more testing for the same amount of money.  :-)

Since a "spot" VM can get terminated at any time, it is possible for
the VM to be terminated immediately after a test has completed and
while the result.xml file is in the middle of being written out.  In
that case, the result.xml file could partially written, resulting in
an invalid result.xml file and lost information about the tests run
before the VM was terminated.

To address this race, write the new result.xml file as result.xml.new,
and only rename it to result.xml after the XML file is fully written
out.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
---
 common/report | 19 +++++++++++--------
 1 file changed, 11 insertions(+), 8 deletions(-)

Comments

Darrick J. Wong July 7, 2023, 3:01 p.m. UTC | #1
On Thu, Jul 06, 2023 at 04:42:31PM -0400, Theodore Ts'o wrote:
> After every single test, we rewrite result.xml from scratch.  This
> ensures that the XML file is always in a valid, parseable state, even
> if the check script is killed or the machine crashes in the middle of
> a test.
> 
> If the test is being run in a Cloud VM as a "spot" (Amazon, Azure, or
> GCE) or "preemptible" (Oracle) instance, it is possible that the VM
> can be halted whenever the Cloud provider needs the capacity for
> customers who are willing to pay full price.  ("Spot" instances can be
> 60% to 90% cheaper --- allowing the frugal kernel developer to get up
> to 10 times more testing for the same amount of money.  :-)
> 
> Since a "spot" VM can get terminated at any time, it is possible for
> the VM to be terminated immediately after a test has completed and
> while the result.xml file is in the middle of being written out.  In
> that case, the result.xml file could partially written, resulting in
> an invalid result.xml file and lost information about the tests run
> before the VM was terminated.
> 
> To address this race, write the new result.xml file as result.xml.new,
> and only rename it to result.xml after the XML file is fully written
> out.
> 
> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
> ---
>  common/report | 19 +++++++++++--------
>  1 file changed, 11 insertions(+), 8 deletions(-)
> 
> diff --git a/common/report b/common/report
> index 9bfa09ecc..3ad14f94e 100644
> --- a/common/report
> +++ b/common/report
> @@ -109,13 +109,15 @@ _xunit_make_section_report()
>  	local notrun_count="$4"
>  	local sect_time="$5"
>  	local timestamp
> +	local tmp_fn="$REPORT_DIR/result.xml.new"
> +	local out_fn="$REPORT_DIR/result.xml"
>  
>  	if [ $sect_name == '-no-sections-' ]; then
>  		sect_name='global'
>  	fi
>  	local report=$tmp.report.xunit.$sect_name.xml
>  	# Header
> -	echo "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" > $REPORT_DIR/result.xml
> +	echo "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" > "$tmp_fn"

Nit: You might want to rm -f $tmp_fn first to reduce the nastiness if
someone plants a named pipe at that path.

>  	if [ -n "$test_start_time" ]; then
>  		timestamp="$(date -Iseconds --date="$test_start_time")"
>  	else
> @@ -123,7 +125,7 @@ _xunit_make_section_report()
>  	fi
>  
>  	local fstests_ns="https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git"
> -	cat >> $REPORT_DIR/result.xml << ENDL
> +	cat >> "$tmp_fn" << ENDL
>  <testsuite
>   xmlns="$fstests_ns"
>   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
> @@ -142,19 +144,20 @@ ENDL
>  	__generate_report_vars
>  
>  	# Properties
> -	echo -e "\t<properties>" >> $REPORT_DIR/result.xml
> +	echo -e "\t<properties>" >> "$tmp_fn"
>  	(for key in "${!REPORT_VARS[@]}"; do
>  		_xunit_add_property "$key" "${REPORT_VARS["$key"]}"
>  	done;
>  	for p in "${REPORT_ENV_LIST[@]}"; do
>  		_xunit_add_property "$p" "${!p}"
> -	done) | sort >> $REPORT_DIR/result.xml
> -	echo -e "\t</properties>" >> $REPORT_DIR/result.xml
> +	done) | sort >> "$tmp_fn"
> +	echo -e "\t</properties>" >> "$tmp_fn"
>  	if [ -f $report ]; then
> -		cat $report >> $REPORT_DIR/result.xml
> +		cat $report >> "$tmp_fn"
>  	fi
> -	echo "</testsuite>" >> $REPORT_DIR/result.xml
> -	echo "Xunit report: $REPORT_DIR/result.xml"
> +	echo "</testsuite>" >> "$tmp_fn"
> +	mv "$tmp_fn" "$out_fn"

Second nit: Make sure we actually wrote tmp_fn before blowing away the
old report.

sync "$tmp_fn" && mv "$tmp_fn" "$out_fn"

With that fixed,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> +	echo "Xunit report: $out_fn"
>  }
>  
>  _xunit_make_testcase_report()
> -- 
> 2.31.0
>
Zorro Lang July 7, 2023, 4:17 p.m. UTC | #2
On Fri, Jul 07, 2023 at 08:01:30AM -0700, Darrick J. Wong wrote:
> On Thu, Jul 06, 2023 at 04:42:31PM -0400, Theodore Ts'o wrote:
> > After every single test, we rewrite result.xml from scratch.  This
> > ensures that the XML file is always in a valid, parseable state, even
> > if the check script is killed or the machine crashes in the middle of
> > a test.
> > 
> > If the test is being run in a Cloud VM as a "spot" (Amazon, Azure, or
> > GCE) or "preemptible" (Oracle) instance, it is possible that the VM
> > can be halted whenever the Cloud provider needs the capacity for
> > customers who are willing to pay full price.  ("Spot" instances can be
> > 60% to 90% cheaper --- allowing the frugal kernel developer to get up
> > to 10 times more testing for the same amount of money.  :-)
> > 
> > Since a "spot" VM can get terminated at any time, it is possible for
> > the VM to be terminated immediately after a test has completed and
> > while the result.xml file is in the middle of being written out.  In
> > that case, the result.xml file could partially written, resulting in
> > an invalid result.xml file and lost information about the tests run
> > before the VM was terminated.
> > 
> > To address this race, write the new result.xml file as result.xml.new,
> > and only rename it to result.xml after the XML file is fully written
> > out.
> > 
> > Signed-off-by: Theodore Ts'o <tytso@mit.edu>
> > ---
> >  common/report | 19 +++++++++++--------
> >  1 file changed, 11 insertions(+), 8 deletions(-)
> > 
> > diff --git a/common/report b/common/report
> > index 9bfa09ecc..3ad14f94e 100644
> > --- a/common/report
> > +++ b/common/report
> > @@ -109,13 +109,15 @@ _xunit_make_section_report()
> >  	local notrun_count="$4"
> >  	local sect_time="$5"
> >  	local timestamp
> > +	local tmp_fn="$REPORT_DIR/result.xml.new"
> > +	local out_fn="$REPORT_DIR/result.xml"
> >  
> >  	if [ $sect_name == '-no-sections-' ]; then
> >  		sect_name='global'
> >  	fi
> >  	local report=$tmp.report.xunit.$sect_name.xml
> >  	# Header
> > -	echo "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" > $REPORT_DIR/result.xml
> > +	echo "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" > "$tmp_fn"
> 
> Nit: You might want to rm -f $tmp_fn first to reduce the nastiness if
> someone plants a named pipe at that path.
> 
> >  	if [ -n "$test_start_time" ]; then
> >  		timestamp="$(date -Iseconds --date="$test_start_time")"
> >  	else
> > @@ -123,7 +125,7 @@ _xunit_make_section_report()
> >  	fi
> >  
> >  	local fstests_ns="https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git"
> > -	cat >> $REPORT_DIR/result.xml << ENDL
> > +	cat >> "$tmp_fn" << ENDL
> >  <testsuite
> >   xmlns="$fstests_ns"
> >   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
> > @@ -142,19 +144,20 @@ ENDL
> >  	__generate_report_vars
> >  
> >  	# Properties
> > -	echo -e "\t<properties>" >> $REPORT_DIR/result.xml
> > +	echo -e "\t<properties>" >> "$tmp_fn"
> >  	(for key in "${!REPORT_VARS[@]}"; do
> >  		_xunit_add_property "$key" "${REPORT_VARS["$key"]}"
> >  	done;
> >  	for p in "${REPORT_ENV_LIST[@]}"; do
> >  		_xunit_add_property "$p" "${!p}"
> > -	done) | sort >> $REPORT_DIR/result.xml
> > -	echo -e "\t</properties>" >> $REPORT_DIR/result.xml
> > +	done) | sort >> "$tmp_fn"
> > +	echo -e "\t</properties>" >> "$tmp_fn"
> >  	if [ -f $report ]; then
> > -		cat $report >> $REPORT_DIR/result.xml
> > +		cat $report >> "$tmp_fn"
> >  	fi
> > -	echo "</testsuite>" >> $REPORT_DIR/result.xml
> > -	echo "Xunit report: $REPORT_DIR/result.xml"
> > +	echo "</testsuite>" >> "$tmp_fn"
> > +	mv "$tmp_fn" "$out_fn"
> 
> Second nit: Make sure we actually wrote tmp_fn before blowing away the
> old report.
> 
> sync "$tmp_fn" && mv "$tmp_fn" "$out_fn"

I can help to add this when I merge it, if Ted doesn't want to change
more than that.

Just curious, will renameat2 ignore data still in cached? I never did things
likes "sync && mv" before, is there any known issue or it's as expected?

Thanks,
Zorro


> 
> With that fixed,
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> 
> --D
> 
> > +	echo "Xunit report: $out_fn"
> >  }
> >  
> >  _xunit_make_testcase_report()
> > -- 
> > 2.31.0
> > 
>
Theodore Ts'o July 7, 2023, 7:13 p.m. UTC | #3
On Sat, Jul 08, 2023 at 12:17:47AM +0800, Zorro Lang wrote:
> > sync "$tmp_fn" && mv "$tmp_fn" "$out_fn"
> 
> I can help to add this when I merge it, if Ted doesn't want to change
> more than that.

I'll be sending out v2 shortly.

> Just curious, will renameat2 ignore data still in cached? I never did things
> likes "sync && mv" before, is there any known issue or it's as expected?

POSIX does not require rename() or renameat2() to do anything with any
cached data.  All it does is make changes to the directory entries
involved.   And that's what the VFS Layer does.

Some file systems implementation *may* do something special on a
rename(2), but it's not required by any specification.  For example,
in the wake of the O_PONIES controversy, ext4 will initiate (but not
wait for) the file writeback of file being renamed *if* it is being
renamed on top of another file, causing the destination inode to be
unlinked and deleted if it doesn't have any other hard links to keep
its reflink about 0.

This means that in the case of renaming result.xml.new on top of
result.xml, if result.xml exists, this will trigger an immediate
writeback of result.xml.new, instead of waiting for the 30 second
writeback timer --- if this is happening on ext4.  We added this to
ext4 because when people would exit the old Tuxracer game, this would
rewrite the top ten file in an unsafe manner, and then close the
OpenGL handle --- and on kernels with buggy proprietary graphics
drivers, it would cause a kernel crash, causing the top-ten score file
to be lost, and causing many users to complain to the file system
developers (but curiously enough, not to the application/game
developers).  So we added something which reduces the chances of this
being an issue, but it's not something anyone should count upon.

As far as sync && mv, that's the shell script equivalent of what
careful userspace programs should always do, which is to write the
file to foo.new, then fsync it, and then rename foo.new to foo,
optionally renaming foo to foo.old or foo~ beforehand.  Text editors
have always gotten this right, because programmers get salty when
their source files get trashed.  For whatever reason, desktop
application and games tend not be as careful....

Cheers,

						- Ted
Zorro Lang July 8, 2023, 3:02 a.m. UTC | #4
On Fri, Jul 07, 2023 at 03:13:31PM -0400, Theodore Ts'o wrote:
> On Sat, Jul 08, 2023 at 12:17:47AM +0800, Zorro Lang wrote:
> > > sync "$tmp_fn" && mv "$tmp_fn" "$out_fn"
> > 
> > I can help to add this when I merge it, if Ted doesn't want to change
> > more than that.
> 
> I'll be sending out v2 shortly.
> 
> > Just curious, will renameat2 ignore data still in cached? I never did things
> > likes "sync && mv" before, is there any known issue or it's as expected?
> 
> POSIX does not require rename() or renameat2() to do anything with any
> cached data.  All it does is make changes to the directory entries
> involved.   And that's what the VFS Layer does.
> 
> Some file systems implementation *may* do something special on a
> rename(2), but it's not required by any specification.  For example,
> in the wake of the O_PONIES controversy, ext4 will initiate (but not
> wait for) the file writeback of file being renamed *if* it is being
> renamed on top of another file, causing the destination inode to be
> unlinked and deleted if it doesn't have any other hard links to keep
> its reflink about 0.
> 
> This means that in the case of renaming result.xml.new on top of
> result.xml, if result.xml exists, this will trigger an immediate
> writeback of result.xml.new, instead of waiting for the 30 second
> writeback timer --- if this is happening on ext4.  We added this to
> ext4 because when people would exit the old Tuxracer game, this would
> rewrite the top ten file in an unsafe manner, and then close the
> OpenGL handle --- and on kernels with buggy proprietary graphics
> drivers, it would cause a kernel crash, causing the top-ten score file
> to be lost, and causing many users to complain to the file system
> developers (but curiously enough, not to the application/game
> developers).  So we added something which reduces the chances of this
> being an issue, but it's not something anyone should count upon.
> 
> As far as sync && mv, that's the shell script equivalent of what
> careful userspace programs should always do, which is to write the
> file to foo.new, then fsync it, and then rename foo.new to foo,
> optionally renaming foo to foo.old or foo~ beforehand.  Text editors
> have always gotten this right, because programmers get salty when
> their source files get trashed.  For whatever reason, desktop
> application and games tend not be as careful....

Thanks Ted, good to know that! Maybe we can have a "_mv" in common/rc does
"sync $1 && mv $1 $2".

Thanks,
Zorro

> 
> Cheers,
> 
> 						- Ted
>
Theodore Ts'o July 8, 2023, 4:31 a.m. UTC | #5
On Sat, Jul 08, 2023 at 11:02:05AM +0800, Zorro Lang wrote:
>
> Thanks Ted, good to know that! Maybe we can have a "_mv" in common/rc does
> "sync $1 && mv $1 $2".

Perhaps, although I can't think of any other place in xfstests were we
are actively (and repeatedly) renaming one file on top of another one,
or rewriting a file in place.

There are certainly places where we use "sed -i" to rewrite a file in
place, and it does do write the file to a new file, and then renames
it to the original file's name (yay!), although it doesn't do an fsync
before the rename (boo!).  That's arguably a bug report we should send
to the maintainer of the sed package.

						- Ted
diff mbox series

Patch

diff --git a/common/report b/common/report
index 9bfa09ecc..3ad14f94e 100644
--- a/common/report
+++ b/common/report
@@ -109,13 +109,15 @@  _xunit_make_section_report()
 	local notrun_count="$4"
 	local sect_time="$5"
 	local timestamp
+	local tmp_fn="$REPORT_DIR/result.xml.new"
+	local out_fn="$REPORT_DIR/result.xml"
 
 	if [ $sect_name == '-no-sections-' ]; then
 		sect_name='global'
 	fi
 	local report=$tmp.report.xunit.$sect_name.xml
 	# Header
-	echo "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" > $REPORT_DIR/result.xml
+	echo "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" > "$tmp_fn"
 	if [ -n "$test_start_time" ]; then
 		timestamp="$(date -Iseconds --date="$test_start_time")"
 	else
@@ -123,7 +125,7 @@  _xunit_make_section_report()
 	fi
 
 	local fstests_ns="https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git"
-	cat >> $REPORT_DIR/result.xml << ENDL
+	cat >> "$tmp_fn" << ENDL
 <testsuite
  xmlns="$fstests_ns"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
@@ -142,19 +144,20 @@  ENDL
 	__generate_report_vars
 
 	# Properties
-	echo -e "\t<properties>" >> $REPORT_DIR/result.xml
+	echo -e "\t<properties>" >> "$tmp_fn"
 	(for key in "${!REPORT_VARS[@]}"; do
 		_xunit_add_property "$key" "${REPORT_VARS["$key"]}"
 	done;
 	for p in "${REPORT_ENV_LIST[@]}"; do
 		_xunit_add_property "$p" "${!p}"
-	done) | sort >> $REPORT_DIR/result.xml
-	echo -e "\t</properties>" >> $REPORT_DIR/result.xml
+	done) | sort >> "$tmp_fn"
+	echo -e "\t</properties>" >> "$tmp_fn"
 	if [ -f $report ]; then
-		cat $report >> $REPORT_DIR/result.xml
+		cat $report >> "$tmp_fn"
 	fi
-	echo "</testsuite>" >> $REPORT_DIR/result.xml
-	echo "Xunit report: $REPORT_DIR/result.xml"
+	echo "</testsuite>" >> "$tmp_fn"
+	mv "$tmp_fn" "$out_fn"
+	echo "Xunit report: $out_fn"
 }
 
 _xunit_make_testcase_report()