Message ID | 20230706204232.1577765-1-tytso@mit.edu (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/2] report: safely update the result.xml file | expand |
On Thu, Jul 06, 2023 at 04:42:31PM -0400, Theodore Ts'o wrote: > After every single test, we rewrite result.xml from scratch. This > ensures that the XML file is always in a valid, parseable state, even > if the check script is killed or the machine crashes in the middle of > a test. > > If the test is being run in a Cloud VM as a "spot" (Amazon, Azure, or > GCE) or "preemptible" (Oracle) instance, it is possible that the VM > can be halted whenever the Cloud provider needs the capacity for > customers who are willing to pay full price. ("Spot" instances can be > 60% to 90% cheaper --- allowing the frugal kernel developer to get up > to 10 times more testing for the same amount of money. :-) > > Since a "spot" VM can get terminated at any time, it is possible for > the VM to be terminated immediately after a test has completed and > while the result.xml file is in the middle of being written out. In > that case, the result.xml file could partially written, resulting in > an invalid result.xml file and lost information about the tests run > before the VM was terminated. > > To address this race, write the new result.xml file as result.xml.new, > and only rename it to result.xml after the XML file is fully written > out. > > Signed-off-by: Theodore Ts'o <tytso@mit.edu> > --- > common/report | 19 +++++++++++-------- > 1 file changed, 11 insertions(+), 8 deletions(-) > > diff --git a/common/report b/common/report > index 9bfa09ecc..3ad14f94e 100644 > --- a/common/report > +++ b/common/report > @@ -109,13 +109,15 @@ _xunit_make_section_report() > local notrun_count="$4" > local sect_time="$5" > local timestamp > + local tmp_fn="$REPORT_DIR/result.xml.new" > + local out_fn="$REPORT_DIR/result.xml" > > if [ $sect_name == '-no-sections-' ]; then > sect_name='global' > fi > local report=$tmp.report.xunit.$sect_name.xml > # Header > - echo "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" > $REPORT_DIR/result.xml > + echo "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" > "$tmp_fn" Nit: You might want to rm -f $tmp_fn first to reduce the nastiness if someone plants a named pipe at that path. > if [ -n "$test_start_time" ]; then > timestamp="$(date -Iseconds --date="$test_start_time")" > else > @@ -123,7 +125,7 @@ _xunit_make_section_report() > fi > > local fstests_ns="https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git" > - cat >> $REPORT_DIR/result.xml << ENDL > + cat >> "$tmp_fn" << ENDL > <testsuite > xmlns="$fstests_ns" > xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" > @@ -142,19 +144,20 @@ ENDL > __generate_report_vars > > # Properties > - echo -e "\t<properties>" >> $REPORT_DIR/result.xml > + echo -e "\t<properties>" >> "$tmp_fn" > (for key in "${!REPORT_VARS[@]}"; do > _xunit_add_property "$key" "${REPORT_VARS["$key"]}" > done; > for p in "${REPORT_ENV_LIST[@]}"; do > _xunit_add_property "$p" "${!p}" > - done) | sort >> $REPORT_DIR/result.xml > - echo -e "\t</properties>" >> $REPORT_DIR/result.xml > + done) | sort >> "$tmp_fn" > + echo -e "\t</properties>" >> "$tmp_fn" > if [ -f $report ]; then > - cat $report >> $REPORT_DIR/result.xml > + cat $report >> "$tmp_fn" > fi > - echo "</testsuite>" >> $REPORT_DIR/result.xml > - echo "Xunit report: $REPORT_DIR/result.xml" > + echo "</testsuite>" >> "$tmp_fn" > + mv "$tmp_fn" "$out_fn" Second nit: Make sure we actually wrote tmp_fn before blowing away the old report. sync "$tmp_fn" && mv "$tmp_fn" "$out_fn" With that fixed, Reviewed-by: Darrick J. Wong <djwong@kernel.org> --D > + echo "Xunit report: $out_fn" > } > > _xunit_make_testcase_report() > -- > 2.31.0 >
On Fri, Jul 07, 2023 at 08:01:30AM -0700, Darrick J. Wong wrote: > On Thu, Jul 06, 2023 at 04:42:31PM -0400, Theodore Ts'o wrote: > > After every single test, we rewrite result.xml from scratch. This > > ensures that the XML file is always in a valid, parseable state, even > > if the check script is killed or the machine crashes in the middle of > > a test. > > > > If the test is being run in a Cloud VM as a "spot" (Amazon, Azure, or > > GCE) or "preemptible" (Oracle) instance, it is possible that the VM > > can be halted whenever the Cloud provider needs the capacity for > > customers who are willing to pay full price. ("Spot" instances can be > > 60% to 90% cheaper --- allowing the frugal kernel developer to get up > > to 10 times more testing for the same amount of money. :-) > > > > Since a "spot" VM can get terminated at any time, it is possible for > > the VM to be terminated immediately after a test has completed and > > while the result.xml file is in the middle of being written out. In > > that case, the result.xml file could partially written, resulting in > > an invalid result.xml file and lost information about the tests run > > before the VM was terminated. > > > > To address this race, write the new result.xml file as result.xml.new, > > and only rename it to result.xml after the XML file is fully written > > out. > > > > Signed-off-by: Theodore Ts'o <tytso@mit.edu> > > --- > > common/report | 19 +++++++++++-------- > > 1 file changed, 11 insertions(+), 8 deletions(-) > > > > diff --git a/common/report b/common/report > > index 9bfa09ecc..3ad14f94e 100644 > > --- a/common/report > > +++ b/common/report > > @@ -109,13 +109,15 @@ _xunit_make_section_report() > > local notrun_count="$4" > > local sect_time="$5" > > local timestamp > > + local tmp_fn="$REPORT_DIR/result.xml.new" > > + local out_fn="$REPORT_DIR/result.xml" > > > > if [ $sect_name == '-no-sections-' ]; then > > sect_name='global' > > fi > > local report=$tmp.report.xunit.$sect_name.xml > > # Header > > - echo "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" > $REPORT_DIR/result.xml > > + echo "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" > "$tmp_fn" > > Nit: You might want to rm -f $tmp_fn first to reduce the nastiness if > someone plants a named pipe at that path. > > > if [ -n "$test_start_time" ]; then > > timestamp="$(date -Iseconds --date="$test_start_time")" > > else > > @@ -123,7 +125,7 @@ _xunit_make_section_report() > > fi > > > > local fstests_ns="https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git" > > - cat >> $REPORT_DIR/result.xml << ENDL > > + cat >> "$tmp_fn" << ENDL > > <testsuite > > xmlns="$fstests_ns" > > xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" > > @@ -142,19 +144,20 @@ ENDL > > __generate_report_vars > > > > # Properties > > - echo -e "\t<properties>" >> $REPORT_DIR/result.xml > > + echo -e "\t<properties>" >> "$tmp_fn" > > (for key in "${!REPORT_VARS[@]}"; do > > _xunit_add_property "$key" "${REPORT_VARS["$key"]}" > > done; > > for p in "${REPORT_ENV_LIST[@]}"; do > > _xunit_add_property "$p" "${!p}" > > - done) | sort >> $REPORT_DIR/result.xml > > - echo -e "\t</properties>" >> $REPORT_DIR/result.xml > > + done) | sort >> "$tmp_fn" > > + echo -e "\t</properties>" >> "$tmp_fn" > > if [ -f $report ]; then > > - cat $report >> $REPORT_DIR/result.xml > > + cat $report >> "$tmp_fn" > > fi > > - echo "</testsuite>" >> $REPORT_DIR/result.xml > > - echo "Xunit report: $REPORT_DIR/result.xml" > > + echo "</testsuite>" >> "$tmp_fn" > > + mv "$tmp_fn" "$out_fn" > > Second nit: Make sure we actually wrote tmp_fn before blowing away the > old report. > > sync "$tmp_fn" && mv "$tmp_fn" "$out_fn" I can help to add this when I merge it, if Ted doesn't want to change more than that. Just curious, will renameat2 ignore data still in cached? I never did things likes "sync && mv" before, is there any known issue or it's as expected? Thanks, Zorro > > With that fixed, > Reviewed-by: Darrick J. Wong <djwong@kernel.org> > > --D > > > + echo "Xunit report: $out_fn" > > } > > > > _xunit_make_testcase_report() > > -- > > 2.31.0 > > >
On Sat, Jul 08, 2023 at 12:17:47AM +0800, Zorro Lang wrote: > > sync "$tmp_fn" && mv "$tmp_fn" "$out_fn" > > I can help to add this when I merge it, if Ted doesn't want to change > more than that. I'll be sending out v2 shortly. > Just curious, will renameat2 ignore data still in cached? I never did things > likes "sync && mv" before, is there any known issue or it's as expected? POSIX does not require rename() or renameat2() to do anything with any cached data. All it does is make changes to the directory entries involved. And that's what the VFS Layer does. Some file systems implementation *may* do something special on a rename(2), but it's not required by any specification. For example, in the wake of the O_PONIES controversy, ext4 will initiate (but not wait for) the file writeback of file being renamed *if* it is being renamed on top of another file, causing the destination inode to be unlinked and deleted if it doesn't have any other hard links to keep its reflink about 0. This means that in the case of renaming result.xml.new on top of result.xml, if result.xml exists, this will trigger an immediate writeback of result.xml.new, instead of waiting for the 30 second writeback timer --- if this is happening on ext4. We added this to ext4 because when people would exit the old Tuxracer game, this would rewrite the top ten file in an unsafe manner, and then close the OpenGL handle --- and on kernels with buggy proprietary graphics drivers, it would cause a kernel crash, causing the top-ten score file to be lost, and causing many users to complain to the file system developers (but curiously enough, not to the application/game developers). So we added something which reduces the chances of this being an issue, but it's not something anyone should count upon. As far as sync && mv, that's the shell script equivalent of what careful userspace programs should always do, which is to write the file to foo.new, then fsync it, and then rename foo.new to foo, optionally renaming foo to foo.old or foo~ beforehand. Text editors have always gotten this right, because programmers get salty when their source files get trashed. For whatever reason, desktop application and games tend not be as careful.... Cheers, - Ted
On Fri, Jul 07, 2023 at 03:13:31PM -0400, Theodore Ts'o wrote: > On Sat, Jul 08, 2023 at 12:17:47AM +0800, Zorro Lang wrote: > > > sync "$tmp_fn" && mv "$tmp_fn" "$out_fn" > > > > I can help to add this when I merge it, if Ted doesn't want to change > > more than that. > > I'll be sending out v2 shortly. > > > Just curious, will renameat2 ignore data still in cached? I never did things > > likes "sync && mv" before, is there any known issue or it's as expected? > > POSIX does not require rename() or renameat2() to do anything with any > cached data. All it does is make changes to the directory entries > involved. And that's what the VFS Layer does. > > Some file systems implementation *may* do something special on a > rename(2), but it's not required by any specification. For example, > in the wake of the O_PONIES controversy, ext4 will initiate (but not > wait for) the file writeback of file being renamed *if* it is being > renamed on top of another file, causing the destination inode to be > unlinked and deleted if it doesn't have any other hard links to keep > its reflink about 0. > > This means that in the case of renaming result.xml.new on top of > result.xml, if result.xml exists, this will trigger an immediate > writeback of result.xml.new, instead of waiting for the 30 second > writeback timer --- if this is happening on ext4. We added this to > ext4 because when people would exit the old Tuxracer game, this would > rewrite the top ten file in an unsafe manner, and then close the > OpenGL handle --- and on kernels with buggy proprietary graphics > drivers, it would cause a kernel crash, causing the top-ten score file > to be lost, and causing many users to complain to the file system > developers (but curiously enough, not to the application/game > developers). So we added something which reduces the chances of this > being an issue, but it's not something anyone should count upon. > > As far as sync && mv, that's the shell script equivalent of what > careful userspace programs should always do, which is to write the > file to foo.new, then fsync it, and then rename foo.new to foo, > optionally renaming foo to foo.old or foo~ beforehand. Text editors > have always gotten this right, because programmers get salty when > their source files get trashed. For whatever reason, desktop > application and games tend not be as careful.... Thanks Ted, good to know that! Maybe we can have a "_mv" in common/rc does "sync $1 && mv $1 $2". Thanks, Zorro > > Cheers, > > - Ted >
On Sat, Jul 08, 2023 at 11:02:05AM +0800, Zorro Lang wrote: > > Thanks Ted, good to know that! Maybe we can have a "_mv" in common/rc does > "sync $1 && mv $1 $2". Perhaps, although I can't think of any other place in xfstests were we are actively (and repeatedly) renaming one file on top of another one, or rewriting a file in place. There are certainly places where we use "sed -i" to rewrite a file in place, and it does do write the file to a new file, and then renames it to the original file's name (yay!), although it doesn't do an fsync before the rename (boo!). That's arguably a bug report we should send to the maintainer of the sed package. - Ted
diff --git a/common/report b/common/report index 9bfa09ecc..3ad14f94e 100644 --- a/common/report +++ b/common/report @@ -109,13 +109,15 @@ _xunit_make_section_report() local notrun_count="$4" local sect_time="$5" local timestamp + local tmp_fn="$REPORT_DIR/result.xml.new" + local out_fn="$REPORT_DIR/result.xml" if [ $sect_name == '-no-sections-' ]; then sect_name='global' fi local report=$tmp.report.xunit.$sect_name.xml # Header - echo "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" > $REPORT_DIR/result.xml + echo "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" > "$tmp_fn" if [ -n "$test_start_time" ]; then timestamp="$(date -Iseconds --date="$test_start_time")" else @@ -123,7 +125,7 @@ _xunit_make_section_report() fi local fstests_ns="https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git" - cat >> $REPORT_DIR/result.xml << ENDL + cat >> "$tmp_fn" << ENDL <testsuite xmlns="$fstests_ns" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" @@ -142,19 +144,20 @@ ENDL __generate_report_vars # Properties - echo -e "\t<properties>" >> $REPORT_DIR/result.xml + echo -e "\t<properties>" >> "$tmp_fn" (for key in "${!REPORT_VARS[@]}"; do _xunit_add_property "$key" "${REPORT_VARS["$key"]}" done; for p in "${REPORT_ENV_LIST[@]}"; do _xunit_add_property "$p" "${!p}" - done) | sort >> $REPORT_DIR/result.xml - echo -e "\t</properties>" >> $REPORT_DIR/result.xml + done) | sort >> "$tmp_fn" + echo -e "\t</properties>" >> "$tmp_fn" if [ -f $report ]; then - cat $report >> $REPORT_DIR/result.xml + cat $report >> "$tmp_fn" fi - echo "</testsuite>" >> $REPORT_DIR/result.xml - echo "Xunit report: $REPORT_DIR/result.xml" + echo "</testsuite>" >> "$tmp_fn" + mv "$tmp_fn" "$out_fn" + echo "Xunit report: $out_fn" } _xunit_make_testcase_report()
After every single test, we rewrite result.xml from scratch. This ensures that the XML file is always in a valid, parseable state, even if the check script is killed or the machine crashes in the middle of a test. If the test is being run in a Cloud VM as a "spot" (Amazon, Azure, or GCE) or "preemptible" (Oracle) instance, it is possible that the VM can be halted whenever the Cloud provider needs the capacity for customers who are willing to pay full price. ("Spot" instances can be 60% to 90% cheaper --- allowing the frugal kernel developer to get up to 10 times more testing for the same amount of money. :-) Since a "spot" VM can get terminated at any time, it is possible for the VM to be terminated immediately after a test has completed and while the result.xml file is in the middle of being written out. In that case, the result.xml file could partially written, resulting in an invalid result.xml file and lost information about the tests run before the VM was terminated. To address this race, write the new result.xml file as result.xml.new, and only rename it to result.xml after the XML file is fully written out. Signed-off-by: Theodore Ts'o <tytso@mit.edu> --- common/report | 19 +++++++++++-------- 1 file changed, 11 insertions(+), 8 deletions(-)