Message ID | 1470935467-52772-2-git-send-email-bfoster@redhat.com (mailing list archive) |
---|---|
State | Accepted, archived |
Headers | show |
On Thu, Aug 11, 2016 at 01:11:03PM -0400, Brian Foster wrote: > The fix to log recovery to update the metadata LSN in recovered buffers > introduces the requirement that a buffer is submitted only once per > current LSN. Log recovery currently submits buffers on transaction > boundaries. This is not sufficient as the abstraction between log > records and transactions allows for various scenarios where multiple > transactions can share the same current LSN. If independent transactions > share an LSN and both modify the same buffer, log recovery can > incorrectly skip updates and leave the filesystem in an inconsisent > state. > > In preparation for proper metadata LSN updates during log recovery, > update log recovery to submit buffers for write on LSN change boundaries > rather than transaction boundaries. Explicitly track the current LSN in > a new struct xlog field to handle the various corner cases of when the > current LSN may or may not change. ..... > @@ -4221,8 +4223,39 @@ xlog_recover_process_ophdr( > return 0; > } > > + /* > + * Recovered buffers are submitted for I/O on current LSN change > + * boundaries. This is necessary to accommodate metadata LSN ordering > + * rules of v5 superblock filesystems. > + * > + * Store the new current LSN in l_recovery_lsn as we cannot rely on > + * either record boundaries or transaction boundaries alone to track LSN > + * changes. This has several contributing factors: > + * > + * - Metadata LSNs are updated at buffer submission time. Thus, buffers > + * can only be submitted safely once per current LSN value. > + * - The current LSN is defined as the start cycle/block of the first > + * record in which a transaction appears. > + * - A record can hold multiple transactions. Thus, a transaction change > + * does not guarantee a change in current LSN. > + * - A transaction can span multiple records. Thus, a record change does > + * not guarantee a change in current LSN. Consider the case where a > + * record holds one small transaction and a subsequent that carries > + * over to the next record. Both transactions share the same LSN as > + * per the definition of the current LSN. > + * > + * In summary, this means we must track the current LSN independently > + * and submit buffers for the previous LSN only when it has changed. > + */ > + if (log->l_recovery_lsn != trans->r_lsn) { > + error = xfs_buf_delwri_submit(buffer_list); > + if (error) > + return error; > + log->l_recovery_lsn = trans->r_lsn; > + } I'm not sure this is the right place to be submitting buffers. We can have multiple transactions open at once because the writing of the transaction to the log is split into two parts: xlog_write() which writes the changes to the log, and xfs_log_done() which writes the commit record (via xlog_commit_record()) to close the transaction. Hence we can get the situation where we have multiple open transactions such as: CA CB CC CD +---------+--------+--------+--+--+--------+-------+--+--+ trans A trans B trans C trans C trans D where the changes in multiple transactions are written before the ophdr that contains the commit record ("CA", "CB", ....) is written. With the above code, we'd be doing writeback of A when we see B, not when we see the commit record for A. Like wise B when we see C. And worse, partial writeback of C when we see the commit record for A... i.e. We are very careful to write commit records in the correct order because that is what determines recovery order, but we don't care what order we write the actual contents of the checkpoints or whether they interleave with other checkpoints. As such, ophdrs change transactions and LSNs without having actually completed recovery of a checkpoint. I think writeback should occur when all the transactions with a given lsn have been committed. I'm not sure there's a simple way to track and detect this, but using the ophdrs to detect a change of lsn to trigger buffer writeback does not look correct to me at this point in time. Cheers, Dave.
On Mon, Aug 29, 2016 at 11:16:31AM +1000, Dave Chinner wrote: > On Thu, Aug 11, 2016 at 01:11:03PM -0400, Brian Foster wrote: > > The fix to log recovery to update the metadata LSN in recovered buffers > > introduces the requirement that a buffer is submitted only once per > > current LSN. Log recovery currently submits buffers on transaction > > boundaries. This is not sufficient as the abstraction between log > > records and transactions allows for various scenarios where multiple > > transactions can share the same current LSN. If independent transactions > > share an LSN and both modify the same buffer, log recovery can > > incorrectly skip updates and leave the filesystem in an inconsisent > > state. > > > > In preparation for proper metadata LSN updates during log recovery, > > update log recovery to submit buffers for write on LSN change boundaries > > rather than transaction boundaries. Explicitly track the current LSN in > > a new struct xlog field to handle the various corner cases of when the > > current LSN may or may not change. > > ..... > > > @@ -4221,8 +4223,39 @@ xlog_recover_process_ophdr( > > return 0; > > } > > > > + /* > > + * Recovered buffers are submitted for I/O on current LSN change > > + * boundaries. This is necessary to accommodate metadata LSN ordering > > + * rules of v5 superblock filesystems. > > + * > > + * Store the new current LSN in l_recovery_lsn as we cannot rely on > > + * either record boundaries or transaction boundaries alone to track LSN > > + * changes. This has several contributing factors: > > + * > > + * - Metadata LSNs are updated at buffer submission time. Thus, buffers > > + * can only be submitted safely once per current LSN value. > > + * - The current LSN is defined as the start cycle/block of the first > > + * record in which a transaction appears. > > + * - A record can hold multiple transactions. Thus, a transaction change > > + * does not guarantee a change in current LSN. > > + * - A transaction can span multiple records. Thus, a record change does > > + * not guarantee a change in current LSN. Consider the case where a > > + * record holds one small transaction and a subsequent that carries > > + * over to the next record. Both transactions share the same LSN as > > + * per the definition of the current LSN. > > + * > > + * In summary, this means we must track the current LSN independently > > + * and submit buffers for the previous LSN only when it has changed. > > + */ > > + if (log->l_recovery_lsn != trans->r_lsn) { > > + error = xfs_buf_delwri_submit(buffer_list); > > + if (error) > > + return error; > > + log->l_recovery_lsn = trans->r_lsn; > > + } > > I'm not sure this is the right place to be submitting buffers. We > can have multiple transactions open at once because the writing of > the transaction to the log is split into two parts: xlog_write() > which writes the changes to the log, and xfs_log_done() which writes > the commit record (via xlog_commit_record()) to close the > transaction. > > Hence we can get the situation where we have multiple open > transactions such as: > > CA CB CC CD > +---------+--------+--------+--+--+--------+-------+--+--+ > trans A trans B trans C trans C trans D > Ok, thanks for drawing this out. I hadn't thought about it from this angle... > where the changes in multiple transactions are written before the > ophdr that contains the commit record ("CA", "CB", ....) is written. > With the above code, we'd be doing writeback of A when we see B, not > when we see the commit record for A. Like wise B when we see C. And > worse, partial writeback of C when we see the commit record for A... > ... but I don't think that accurately describes the behavior here. At least, I'm not sure there is enough information presented to describe when buffers are going to be submitted because we don't have the mapping of transactions to records. That aside for a moment, the current recovery code makes a couple passes over the entire dirty log. The first pass is for management of cancelled items and thus not really relevant to this problem. The second pass constructs all of the transactions in memory and on XLOG_COMMIT_TRANS, we iterate all of the items in said transaction, perform recovery on the actual filesystem metadata buffers and submit the buffers for I/O. As you've outlined above, I believe this corresponds to the commit record as this flag is written out by xlog_commit_record(). Given that, this change can't cause writeback of A when we see B because the buffers for A still aren't read, recovered and queued for I/O until we see the commit record for A. Instead, what this change does is to defer the final I/O submission from commit record time until we see a metadata LSN change. This effectively means that the current "per transaction" buffer delwri queue becomes a "per metadata LSN" queue and ensures that a subsequent transaction that happens to use the same LSN and modify the same block(s) does not submit the buffer for write multiple times from contexts that would stamp the same metadata LSN. In other words, this change can be viewed as batching or "plugging" buffer submissions from commit records to subsequent metadata LSN updates. It doesn't actually reorder how transactions are recovered in any ways. Getting back to the example above, what should happen (in pass 2) is: - Allocate trans A and start populating ->r_itemq. - Allocate trans B and start populating ->r_itemq. - Allocate trans C and start populating ->r_itemq. [Note: buffer_list is still empty so any current LSN updates cause no actions to this point.] - Hit CR A, recover all of the items in trans A and queue the buffers onto buffer_list. Now that buffer_list is populated, the buffers will be submitted on the next metadata LSN change. - Hit CR B. We don't know what's going to happen to buffer_list here because the example doesn't define record context. We'll look up transaction B and if transaction B started in the same record as A, for example, we won't actually drain buffer_list. If they started in different records (e.g., different LSNs), we drain buffer_list and proceed. In either case, we perform recovery of the items in trans B and queue B's buffers to buffer_list. - We revisit trans C. Again, the starting LSN of trans C and B determine whether we drain buffer_list. Note that buffer_list still only contains buffers up through transaction B (i.e., possibly A as well) as we have not yet recovered or queued anything from transaction C. - Allocate trans D and start populating ->r_itemq. ... ... and so on until the end and we drain buffer_list from the last LSN in xlog_do_recovery_pass(). > i.e. We are very careful to write commit records in the correct > order because that is what determines recovery order, but we don't > care what order we write the actual contents of the checkpoints or > whether they interleave with other checkpoints. As such, ophdrs > change transactions and LSNs without having actually completed > recovery of a checkpoint. > > I think writeback should occur when all the transactions with a > given lsn have been committed. I'm not sure there's a simple way to > track and detect this, but using the ophdrs to detect a change of > lsn to trigger buffer writeback does not look correct to me at this > point in time. > That is precisely the intent of this patch. What I think could be a problem is something like the following, if possible: CA CB CC CD +---------+--------+--+-------+--+--------+-------+--+--+ trans A trans B trans C trans C trans D Assume that trans A and trans B are within the same record and trans C is in a separate record. In that case, we commit trans A which populates buffer_list. We lookup trans C, note a new LSN and drain buffer_list. Then we ultimately commit trans B, which has the same metadata LSN as trans A and thus is a path to the original problem if trans B happened to modify any of the same blocks as trans A. Do note however that this is just an occurrence of the problem with log recovery as implemented today (once we update metadata LSNs, and is likely rare as I haven't been able to reproduce corruption in many tries). If that analysis is correct, I think a straightforward solution might be to defer submission to the lookup of a transaction with a new LSN that _also_ corresponds with processing of a commit record based on where we are in the on-disk log. E.g.: if (log->l_recovery_lsn != trans->r_lsn && oh_flags & XLOG_COMMIT_TRANS) { error = xfs_buf_delwri_submit(buffer_list); ... } So in the above, we'd submit buffers for A and B once we visit the commit record for trans C. Thoughts? Brian > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com
[ sorry to take so long to get back to this, Brian, I missed your reply and only yesterday when I was sorting out for-next updates that I still had this on my "for-review" patch stack. ] On Mon, Aug 29, 2016 at 02:17:22PM -0400, Brian Foster wrote: > On Mon, Aug 29, 2016 at 11:16:31AM +1000, Dave Chinner wrote: > > On Thu, Aug 11, 2016 at 01:11:03PM -0400, Brian Foster wrote: > > i.e. We are very careful to write commit records in the correct > > order because that is what determines recovery order, but we don't > > care what order we write the actual contents of the checkpoints or > > whether they interleave with other checkpoints. As such, ophdrs > > change transactions and LSNs without having actually completed > > recovery of a checkpoint. > > > > I think writeback should occur when all the transactions with a > > given lsn have been committed. I'm not sure there's a simple way to > > track and detect this, but using the ophdrs to detect a change of > > lsn to trigger buffer writeback does not look correct to me at this > > point in time. > > > > That is precisely the intent of this patch. What I think could be a > problem is something like the following, if possible: > > CA CB CC CD > +---------+--------+--+-------+--+--------+-------+--+--+ > trans A trans B trans C trans C trans D Yes, that's possible. > Assume that trans A and trans B are within the same record and trans C > is in a separate record. In that case, we commit trans A which populates > buffer_list. We lookup trans C, note a new LSN and drain buffer_list. > Then we ultimately commit trans B, which has the same metadata LSN as > trans A and thus is a path to the original problem if trans B happened > to modify any of the same blocks as trans A. Yes, that's right, we still are exposed to the same problem, and there's much more convoluted versions of it possible. > Do note however that this is just an occurrence of the problem with log > recovery as implemented today (once we update metadata LSNs, and is > likely rare as I haven't been able to reproduce corruption in many > tries). Yeah, it's damn hard to intentionally cause interleaving of checkpoint and commit records these days because of the delayed logging does aggregation in memory rather than in the log buffers themselves. > If that analysis is correct, I think a straightforward solution > might be to defer submission to the lookup of a transaction with a new > LSN that _also_ corresponds with processing of a commit record based on > where we are in the on-disk log. E.g.: > > if (log->l_recovery_lsn != trans->r_lsn && > oh_flags & XLOG_COMMIT_TRANS) { > error = xfs_buf_delwri_submit(buffer_list); > ... > } > > So in the above, we'd submit buffers for A and B once we visit the > commit record for trans C. Thoughts? Sounds plausible - let me just check I understood by repeating it back. Given the above case, we start with log->l_recovery_lsn set to the lsn before trans A and an empty buffer list. 1. We now recover trans A and trans B into their respective structures, but we don't don't add their dirty buffers to the delwri list yet - they are kept internal to the trans. 2. We then see commit A, and because the buffer list is empty we simply add them to the buffer list and update log->l_recovery_lsn to point at the transaction LSN. 3. We now see trans C, and start recovering it into an internal buffer list. 4. Then we process commit B, see that there are already queued buffers and so check the transaction LSN against log->l_recovery_lsn. They are the same, so we simply add the transactions dirty buffers to the buffer list. 5. We continue processing transaction C, and start on transaction D. We then see commit C. Buffer list is populated, so we check transaction lsn against log->l_recovery_lsn. They are different. At this point we know we have fully processed all the transactions that are associated with log->l_recovery_lsn, hence we can submit the buffer_list and mark it empty again. 6. At this point we jump back to step 2, this time processing commit C onwards.... 7. At the end of log recovery, we commit the remaining buffer list from the last transaction we recovered from the log. Did I understand it right? If so, I think this will work just fine. Thanks, Brian! -Dave.
On Tue, Sep 20, 2016 at 10:13:30AM +1000, Dave Chinner wrote: > [ sorry to take so long to get back to this, Brian, I missed your > reply and only yesterday when I was sorting out for-next updates > that I still had this on my "for-review" patch stack. ] > No problem. I've been away anyways.. > On Mon, Aug 29, 2016 at 02:17:22PM -0400, Brian Foster wrote: > > On Mon, Aug 29, 2016 at 11:16:31AM +1000, Dave Chinner wrote: > > > On Thu, Aug 11, 2016 at 01:11:03PM -0400, Brian Foster wrote: > > > i.e. We are very careful to write commit records in the correct > > > order because that is what determines recovery order, but we don't > > > care what order we write the actual contents of the checkpoints or > > > whether they interleave with other checkpoints. As such, ophdrs > > > change transactions and LSNs without having actually completed > > > recovery of a checkpoint. > > > > > > I think writeback should occur when all the transactions with a > > > given lsn have been committed. I'm not sure there's a simple way to > > > track and detect this, but using the ophdrs to detect a change of > > > lsn to trigger buffer writeback does not look correct to me at this > > > point in time. > > > > > > > That is precisely the intent of this patch. What I think could be a > > problem is something like the following, if possible: > > > > CA CB CC CD > > +---------+--------+--+-------+--+--------+-------+--+--+ > > trans A trans B trans C trans C trans D > > Yes, that's possible. > Ok. > > Assume that trans A and trans B are within the same record and trans C > > is in a separate record. In that case, we commit trans A which populates > > buffer_list. We lookup trans C, note a new LSN and drain buffer_list. > > Then we ultimately commit trans B, which has the same metadata LSN as > > trans A and thus is a path to the original problem if trans B happened > > to modify any of the same blocks as trans A. > > Yes, that's right, we still are exposed to the same problem, and > there's much more convoluted versions of it possible. > > > Do note however that this is just an occurrence of the problem with log > > recovery as implemented today (once we update metadata LSNs, and is > > likely rare as I haven't been able to reproduce corruption in many > > tries). > > Yeah, it's damn hard to intentionally cause interleaving of > checkpoint and commit records these days because of the delayed > logging does aggregation in memory rather than in the log buffers > themselves. > Makes sense. > > If that analysis is correct, I think a straightforward solution > > might be to defer submission to the lookup of a transaction with a new > > LSN that _also_ corresponds with processing of a commit record based on > > where we are in the on-disk log. E.g.: > > > > if (log->l_recovery_lsn != trans->r_lsn && > > oh_flags & XLOG_COMMIT_TRANS) { > > error = xfs_buf_delwri_submit(buffer_list); > > ... > > } > > > > So in the above, we'd submit buffers for A and B once we visit the > > commit record for trans C. Thoughts? > > Sounds plausible - let me just check I understood by repeating it > back. Given the above case, we start with log->l_recovery_lsn set to > the lsn before trans A and an empty buffer list. > > 1. We now recover trans A and trans B into their respective structures, > but we don't don't add their dirty buffers to the delwri list yet - > they are kept internal to the trans. > > 2. We then see commit A, and because the buffer list is empty we > simply add them to the buffer list and update log->l_recovery_lsn to > point at the transaction LSN. > Right... > 3. We now see trans C, and start recovering it into an internal buffer > list. > > 4. Then we process commit B, see that there are already queued buffers > and so check the transaction LSN against log->l_recovery_lsn. They > are the same, so we simply add the transactions dirty buffers to > the buffer list. > Maybe just weird wording here, but to be precise (and pedantic), the top-level check is for the current LSN change, not necessarily whether the buffer_list is empty or not. The behavior is the same either way. > 5. We continue processing transaction C, and start on transaction D. > We then see commit C. Buffer list is populated, so we check > transaction lsn against log->l_recovery_lsn. They are different. > At this point we know we have fully processed all the transactions > that are associated with log->l_recovery_lsn, hence we can submit > the buffer_list and mark it empty again. > > 6. At this point we jump back to step 2, this time processing commit > C onwards.... > > 7. At the end of log recovery, we commit the remaining buffer list > from the last transaction we recovered from the log. > > Did I understand it right? If so, I think this will work just fine. > Yep, I think so. I'll send an updated version. Brian > Thanks, Brian! > > -Dave. > -- > Dave Chinner > david@fromorbit.com > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs
diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h index 765f084..2b6eec5 100644 --- a/fs/xfs/xfs_log_priv.h +++ b/fs/xfs/xfs_log_priv.h @@ -413,7 +413,8 @@ struct xlog { /* log record crc error injection factor */ uint32_t l_badcrc_factor; #endif - + /* log recovery lsn tracking (for buffer submission */ + xfs_lsn_t l_recovery_lsn; }; #define XLOG_BUF_CANCEL_BUCKET(log, blkno) \ diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c index e8638fd..97a0af9 100644 --- a/fs/xfs/xfs_log_recover.c +++ b/fs/xfs/xfs_log_recover.c @@ -3846,14 +3846,13 @@ STATIC int xlog_recover_commit_trans( struct xlog *log, struct xlog_recover *trans, - int pass) + int pass, + struct list_head *buffer_list) { int error = 0; - int error2; int items_queued = 0; struct xlog_recover_item *item; struct xlog_recover_item *next; - LIST_HEAD (buffer_list); LIST_HEAD (ra_list); LIST_HEAD (done_list); @@ -3876,7 +3875,7 @@ xlog_recover_commit_trans( items_queued++; if (items_queued >= XLOG_RECOVER_COMMIT_QUEUE_MAX) { error = xlog_recover_items_pass2(log, trans, - &buffer_list, &ra_list); + buffer_list, &ra_list); list_splice_tail_init(&ra_list, &done_list); items_queued = 0; } @@ -3894,15 +3893,14 @@ out: if (!list_empty(&ra_list)) { if (!error) error = xlog_recover_items_pass2(log, trans, - &buffer_list, &ra_list); + buffer_list, &ra_list); list_splice_tail_init(&ra_list, &done_list); } if (!list_empty(&done_list)) list_splice_init(&done_list, &trans->r_itemq); - error2 = xfs_buf_delwri_submit(&buffer_list); - return error ? error : error2; + return error; } STATIC void @@ -4085,7 +4083,8 @@ xlog_recovery_process_trans( char *dp, unsigned int len, unsigned int flags, - int pass) + int pass, + struct list_head *buffer_list) { int error = 0; bool freeit = false; @@ -4109,7 +4108,8 @@ xlog_recovery_process_trans( error = xlog_recover_add_to_cont_trans(log, trans, dp, len); break; case XLOG_COMMIT_TRANS: - error = xlog_recover_commit_trans(log, trans, pass); + error = xlog_recover_commit_trans(log, trans, pass, + buffer_list); /* success or fail, we are now done with this transaction. */ freeit = true; break; @@ -4191,10 +4191,12 @@ xlog_recover_process_ophdr( struct xlog_op_header *ohead, char *dp, char *end, - int pass) + int pass, + struct list_head *buffer_list) { struct xlog_recover *trans; unsigned int len; + int error; /* Do we understand who wrote this op? */ if (ohead->oh_clientid != XFS_TRANSACTION && @@ -4221,8 +4223,39 @@ xlog_recover_process_ophdr( return 0; } + /* + * Recovered buffers are submitted for I/O on current LSN change + * boundaries. This is necessary to accommodate metadata LSN ordering + * rules of v5 superblock filesystems. + * + * Store the new current LSN in l_recovery_lsn as we cannot rely on + * either record boundaries or transaction boundaries alone to track LSN + * changes. This has several contributing factors: + * + * - Metadata LSNs are updated at buffer submission time. Thus, buffers + * can only be submitted safely once per current LSN value. + * - The current LSN is defined as the start cycle/block of the first + * record in which a transaction appears. + * - A record can hold multiple transactions. Thus, a transaction change + * does not guarantee a change in current LSN. + * - A transaction can span multiple records. Thus, a record change does + * not guarantee a change in current LSN. Consider the case where a + * record holds one small transaction and a subsequent that carries + * over to the next record. Both transactions share the same LSN as + * per the definition of the current LSN. + * + * In summary, this means we must track the current LSN independently + * and submit buffers for the previous LSN only when it has changed. + */ + if (log->l_recovery_lsn != trans->r_lsn) { + error = xfs_buf_delwri_submit(buffer_list); + if (error) + return error; + log->l_recovery_lsn = trans->r_lsn; + } + return xlog_recovery_process_trans(log, trans, dp, len, - ohead->oh_flags, pass); + ohead->oh_flags, pass, buffer_list); } /* @@ -4240,7 +4273,8 @@ xlog_recover_process_data( struct hlist_head rhash[], struct xlog_rec_header *rhead, char *dp, - int pass) + int pass, + struct list_head *buffer_list) { struct xlog_op_header *ohead; char *end; @@ -4262,7 +4296,7 @@ xlog_recover_process_data( /* errors will abort recovery */ error = xlog_recover_process_ophdr(log, rhash, rhead, ohead, - dp, end, pass); + dp, end, pass, buffer_list); if (error) return error; @@ -4685,7 +4719,8 @@ xlog_recover_process( struct hlist_head rhash[], struct xlog_rec_header *rhead, char *dp, - int pass) + int pass, + struct list_head *buffer_list) { int error; __le32 crc; @@ -4732,7 +4767,8 @@ xlog_recover_process( if (error) return error; - return xlog_recover_process_data(log, rhash, rhead, dp, pass); + return xlog_recover_process_data(log, rhash, rhead, dp, pass, + buffer_list); } STATIC int @@ -4793,9 +4829,11 @@ xlog_do_recovery_pass( char *offset; xfs_buf_t *hbp, *dbp; int error = 0, h_size, h_len; + int error2 = 0; int bblks, split_bblks; int hblks, split_hblks, wrapped_hblks; struct hlist_head rhash[XLOG_RHASH_SIZE]; + LIST_HEAD (buffer_list); ASSERT(head_blk != tail_blk); rhead_blk = 0; @@ -4981,7 +5019,7 @@ xlog_do_recovery_pass( } error = xlog_recover_process(log, rhash, rhead, offset, - pass); + pass, &buffer_list); if (error) goto bread_err2; @@ -5012,7 +5050,8 @@ xlog_do_recovery_pass( if (error) goto bread_err2; - error = xlog_recover_process(log, rhash, rhead, offset, pass); + error = xlog_recover_process(log, rhash, rhead, offset, pass, + &buffer_list); if (error) goto bread_err2; @@ -5025,10 +5064,17 @@ xlog_do_recovery_pass( bread_err1: xlog_put_bp(hbp); + /* + * Submit buffers that have been added from the last record processed, + * regardless of error status. + */ + if (!list_empty(&buffer_list)) + error2 = xfs_buf_delwri_submit(&buffer_list); + if (error && first_bad) *first_bad = rhead_blk; - return error; + return error ? error : error2; } /*
The fix to log recovery to update the metadata LSN in recovered buffers introduces the requirement that a buffer is submitted only once per current LSN. Log recovery currently submits buffers on transaction boundaries. This is not sufficient as the abstraction between log records and transactions allows for various scenarios where multiple transactions can share the same current LSN. If independent transactions share an LSN and both modify the same buffer, log recovery can incorrectly skip updates and leave the filesystem in an inconsisent state. In preparation for proper metadata LSN updates during log recovery, update log recovery to submit buffers for write on LSN change boundaries rather than transaction boundaries. Explicitly track the current LSN in a new struct xlog field to handle the various corner cases of when the current LSN may or may not change. Signed-off-by: Brian Foster <bfoster@redhat.com> --- fs/xfs/xfs_log_priv.h | 3 +- fs/xfs/xfs_log_recover.c | 82 +++++++++++++++++++++++++++++++++++++----------- 2 files changed, 66 insertions(+), 19 deletions(-)