Message ID | 20210426220552.45413-2-junxiao.bi@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/3] fs/buffer.c: add new api to allow eof writeback | expand |
Hi Joseph, Can you help review the first two patches? Thanks, Junxiao. On 4/26/21 3:05 PM, Junxiao Bi wrote: > When fallocate/truncate extend inode size, if the original isize is in > the middle of last cluster, then the part from isize to the end of the > cluster needs to be zeroed with buffer write, at that time isize is not > yet updated to match the new size, if writeback is kicked in, it will > invoke ocfs2_writepage()->block_write_full_page() where the pages out > of inode size will be dropped. That will cause file corruption. > > Running the following command with qemu-image 4.2.1 can get a corrupted > coverted image file easily. > > qemu-img convert -p -t none -T none -f qcow2 $qcow_image \ > -O qcow2 -o compat=1.1 $qcow_image.conv > > Cc: <stable@vger.kernel.org> > Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> > --- > fs/ocfs2/aops.c | 19 ++++++++++++++++++- > 1 file changed, 18 insertions(+), 1 deletion(-) > > diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c > index ad20403b383f..7a3e3d59f6a9 100644 > --- a/fs/ocfs2/aops.c > +++ b/fs/ocfs2/aops.c > @@ -402,11 +402,28 @@ static void ocfs2_readahead(struct readahead_control *rac) > */ > static int ocfs2_writepage(struct page *page, struct writeback_control *wbc) > { > + struct inode * const inode = page->mapping->host; > + loff_t i_size = i_size_read(inode); > + const pgoff_t end_index = i_size >> PAGE_SHIFT; > + unsigned int offset; > + > trace_ocfs2_writepage( > (unsigned long long)OCFS2_I(page->mapping->host)->ip_blkno, > page->index); > > - return block_write_full_page(page, ocfs2_get_block, wbc); > + /* > + * The page straddles i_size. It must be zeroed out on each and every > + * writepage invocation because it may be mmapped. "A file is mapped > + * in multiples of the page size. For a file that is not a multiple of > + * the page size, the remaining memory is zeroed when mapped, and > + * writes to that region are not written out to the file." > + */ > + offset = i_size & (PAGE_SIZE-1); > + if (page->index == end_index && offset) > + zero_user_segment(page, offset, PAGE_SIZE); > + > + return __block_write_full_page_eof(inode, page, ocfs2_get_block, wbc, > + end_buffer_async_write, true); > } > > /* Taken from ext3. We don't necessarily need the full blown
On 4/27/21 6:05 AM, Junxiao Bi wrote: > When fallocate/truncate extend inode size, if the original isize is in > the middle of last cluster, then the part from isize to the end of the > cluster needs to be zeroed with buffer write, at that time isize is not > yet updated to match the new size, if writeback is kicked in, it will > invoke ocfs2_writepage()->block_write_full_page() where the pages out > of inode size will be dropped. That will cause file corruption. > > Running the following command with qemu-image 4.2.1 can get a corrupted > coverted image file easily. > > qemu-img convert -p -t none -T none -f qcow2 $qcow_image \ > -O qcow2 -o compat=1.1 $qcow_image.conv > > Cc: <stable@vger.kernel.org> > Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> > --- > fs/ocfs2/aops.c | 19 ++++++++++++++++++- > 1 file changed, 18 insertions(+), 1 deletion(-) > > diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c > index ad20403b383f..7a3e3d59f6a9 100644 > --- a/fs/ocfs2/aops.c > +++ b/fs/ocfs2/aops.c > @@ -402,11 +402,28 @@ static void ocfs2_readahead(struct readahead_control *rac) > */ > static int ocfs2_writepage(struct page *page, struct writeback_control *wbc) > { > + struct inode * const inode = page->mapping->host; > + loff_t i_size = i_size_read(inode); > + const pgoff_t end_index = i_size >> PAGE_SHIFT; > + unsigned int offset; > + > trace_ocfs2_writepage( > (unsigned long long)OCFS2_I(page->mapping->host)->ip_blkno, > page->index); > > - return block_write_full_page(page, ocfs2_get_block, wbc); > + /* > + * The page straddles i_size. It must be zeroed out on each and every > + * writepage invocation because it may be mmapped. "A file is mapped > + * in multiples of the page size. For a file that is not a multiple of > + * the page size, the remaining memory is zeroed when mapped, and > + * writes to that region are not written out to the file." > + */ > + offset = i_size & (PAGE_SIZE-1); > + if (page->index == end_index && offset) > + zero_user_segment(page, offset, PAGE_SIZE); > + > + return __block_write_full_page_eof(inode, page, ocfs2_get_block, wbc, > + end_buffer_async_write, true); > } > > /* Taken from ext3. We don't necessarily need the full blown >
diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c index ad20403b383f..7a3e3d59f6a9 100644 --- a/fs/ocfs2/aops.c +++ b/fs/ocfs2/aops.c @@ -402,11 +402,28 @@ static void ocfs2_readahead(struct readahead_control *rac) */ static int ocfs2_writepage(struct page *page, struct writeback_control *wbc) { + struct inode * const inode = page->mapping->host; + loff_t i_size = i_size_read(inode); + const pgoff_t end_index = i_size >> PAGE_SHIFT; + unsigned int offset; + trace_ocfs2_writepage( (unsigned long long)OCFS2_I(page->mapping->host)->ip_blkno, page->index); - return block_write_full_page(page, ocfs2_get_block, wbc); + /* + * The page straddles i_size. It must be zeroed out on each and every + * writepage invocation because it may be mmapped. "A file is mapped + * in multiples of the page size. For a file that is not a multiple of + * the page size, the remaining memory is zeroed when mapped, and + * writes to that region are not written out to the file." + */ + offset = i_size & (PAGE_SIZE-1); + if (page->index == end_index && offset) + zero_user_segment(page, offset, PAGE_SIZE); + + return __block_write_full_page_eof(inode, page, ocfs2_get_block, wbc, + end_buffer_async_write, true); } /* Taken from ext3. We don't necessarily need the full blown
When fallocate/truncate extend inode size, if the original isize is in the middle of last cluster, then the part from isize to the end of the cluster needs to be zeroed with buffer write, at that time isize is not yet updated to match the new size, if writeback is kicked in, it will invoke ocfs2_writepage()->block_write_full_page() where the pages out of inode size will be dropped. That will cause file corruption. Running the following command with qemu-image 4.2.1 can get a corrupted coverted image file easily. qemu-img convert -p -t none -T none -f qcow2 $qcow_image \ -O qcow2 -o compat=1.1 $qcow_image.conv Cc: <stable@vger.kernel.org> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> --- fs/ocfs2/aops.c | 19 ++++++++++++++++++- 1 file changed, 18 insertions(+), 1 deletion(-)