Message ID | 1468974083-3660-1-git-send-email-david@fromorbit.com (mailing list archive) |
---|---|
State | Accepted, archived |
Headers | show |
On Wed, Jul 20, 2016 at 10:21:23AM +1000, Dave Chinner wrote: > From: Dave Chinner <dchinner@redhat.com> > > In xfs_finish_page_writeback(), we have a loop that looks like this: > > do { > if (off < bvec->bv_offset) > goto next_bh; > if (off > end) > break; > bh->b_end_io(bh, !error); > next_bh: > off += bh->b_size; > } while ((bh = bh->b_this_page) != head); > > The b_end_io function is end_buffer_async_write(), which will call > end_page_writeback() once all the buffers have marked as no longer > under IO. This issue here is that the only thing currently > protecting both the bufferhead chain and the page from being > reclaimed is the PageWriteback state held on the page. > > While we attempt to limit the loop to just the buffers covered by > the IO, we still read from the buffer size and follow the next > pointer in the bufferhead chain. There is no guarantee that either > of these are valid after the PageWriteback flag has been cleared. > Hence, loops like this are completely unsafe, and result in > use-after-free issues. One such problem was caught by Calvin Owens > with KASAN: > ... > > > Where the access is occuring during IO completion after the buffer > had been freed from direct memory reclaim. > > Prevent use-after-free accidents in this end_io processing loop by > pre-calculating the loop conditionals before calling bh->b_end_io(). > The loop is already limited to just the bufferheads covered by the > IO in progress, so the offset checks are sufficient to prevent > accessing buffers in the chain after end_page_writeback() has been > called by the the bh->b_end_io() callout. > > Yet another example of why Bufferheads Must Die. > > Signed-off-by: Dave Chinner <dchinner@redhat.com> > Reported-and-Tested-by: Calvin Owens <calvinowens@fb.com> > --- Reviewed-by: Brian Foster <bfoster@redhat.com> > fs/xfs/xfs_aops.c | 15 ++++++++++++--- > 1 file changed, 12 insertions(+), 3 deletions(-) > > diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c > index 80714eb..0cfb944 100644 > --- a/fs/xfs/xfs_aops.c > +++ b/fs/xfs/xfs_aops.c > @@ -87,6 +87,12 @@ xfs_find_bdev_for_inode( > * We're now finished for good with this page. Update the page state via the > * associated buffer_heads, paying attention to the start and end offsets that > * we need to process on the page. > + * > + * Landmine Warning: bh->b_end_io() will call end_page_writeback() on the last > + * buffer in the IO. Once it does this, it is unsafe to access the bufferhead or > + * the page at all, as we may be racing with memory reclaim and it can free both > + * the bufferhead chain and the page as it will see the page as clean and > + * unused. > */ > static void > xfs_finish_page_writeback( > @@ -95,8 +101,9 @@ xfs_finish_page_writeback( > int error) > { > unsigned int end = bvec->bv_offset + bvec->bv_len - 1; > - struct buffer_head *head, *bh; > + struct buffer_head *head, *bh, *next; > unsigned int off = 0; > + unsigned int bsize; > > ASSERT(bvec->bv_offset < PAGE_SIZE); > ASSERT((bvec->bv_offset & ((1 << inode->i_blkbits) - 1)) == 0); > @@ -105,15 +112,17 @@ xfs_finish_page_writeback( > > bh = head = page_buffers(bvec->bv_page); > > + bsize = bh->b_size; > do { > + next = bh->b_this_page; > if (off < bvec->bv_offset) > goto next_bh; > if (off > end) > break; > bh->b_end_io(bh, !error); > next_bh: > - off += bh->b_size; > - } while ((bh = bh->b_this_page) != head); > + off += bsize; > + } while ((bh = next) != head); > } > > /* > -- > 2.8.0.rc3 > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs
Looks fine,
Reviewed-by: Christoph Hellwig <hch@lst.de>
(should probably go into 4.7 still..)
On Thu, Jul 21, 2016 at 07:32:21AM -0700, Christoph Hellwig wrote: > Looks fine, > > Reviewed-by: Christoph Hellwig <hch@lst.de> > > (should probably go into 4.7 still..) I'll tag it for -stable so it gets back there appropriately. The problem exists in kernels long before the changes we've made recently, however, but I don't think this fix will apply cleanly to older kernels... Cheers, Dave.
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c index 80714eb..0cfb944 100644 --- a/fs/xfs/xfs_aops.c +++ b/fs/xfs/xfs_aops.c @@ -87,6 +87,12 @@ xfs_find_bdev_for_inode( * We're now finished for good with this page. Update the page state via the * associated buffer_heads, paying attention to the start and end offsets that * we need to process on the page. + * + * Landmine Warning: bh->b_end_io() will call end_page_writeback() on the last + * buffer in the IO. Once it does this, it is unsafe to access the bufferhead or + * the page at all, as we may be racing with memory reclaim and it can free both + * the bufferhead chain and the page as it will see the page as clean and + * unused. */ static void xfs_finish_page_writeback( @@ -95,8 +101,9 @@ xfs_finish_page_writeback( int error) { unsigned int end = bvec->bv_offset + bvec->bv_len - 1; - struct buffer_head *head, *bh; + struct buffer_head *head, *bh, *next; unsigned int off = 0; + unsigned int bsize; ASSERT(bvec->bv_offset < PAGE_SIZE); ASSERT((bvec->bv_offset & ((1 << inode->i_blkbits) - 1)) == 0); @@ -105,15 +112,17 @@ xfs_finish_page_writeback( bh = head = page_buffers(bvec->bv_page); + bsize = bh->b_size; do { + next = bh->b_this_page; if (off < bvec->bv_offset) goto next_bh; if (off > end) break; bh->b_end_io(bh, !error); next_bh: - off += bh->b_size; - } while ((bh = bh->b_this_page) != head); + off += bsize; + } while ((bh = next) != head); } /*