[v3,0/5] ext4: fix inconsistency since async write metadata buffer error

Message ID	20200620025427.1756360-1-yi.zhang@huawei.com (mailing list archive)
Headers	show Return-Path: <SRS0=uxMm=AB=vger.kernel.org=linux-fsdevel-owner@kernel.org> From: "zhangyi (F)" <yi.zhang@huawei.com> To: <linux-ext4@vger.kernel.org>, <tytso@mit.edu>, <jack@suse.cz> CC: <adilger.kernel@dilger.ca>, <zhangxiaoxu5@huawei.com>, <yi.zhang@huawei.com>, <linux-fsdevel@vger.kernel.org> Subject: [PATCH v3 0/5] ext4: fix inconsistency since async write metadata buffer error Date: Sat, 20 Jun 2020 10:54:22 +0800 Message-ID: <20200620025427.1756360-1-yi.zhang@huawei.com> MIME-Version: 1.0 Content-Transfer-Encoding: 7BIT Content-Type: text/plain; charset=US-ASCII Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk
Series	ext4: fix inconsistency since async write metadata buffer error \| expand [v3,0/5] ext4: fix inconsistency since async write metadata buffer error [v3,1/5] ext4: abort the filesystem if failed to async write metadata buffer [v3,2/5] ext4: remove ext4_buffer_uptodate() [v3,3/5] ext4: remove write io error check before read inode block [v3,4/5] jbd2: abort journal if free a async write error metadata buffer [v3,5/5] jbd2: remove unused parameter in jbd2_journal_try_to_free_buffers()

Message ID

20200620025427.1756360-1-yi.zhang@huawei.com (mailing list archive)

Headers

From: "zhangyi (F)" <yi.zhang@huawei.com>
To: <linux-ext4@vger.kernel.org>, <tytso@mit.edu>, <jack@suse.cz>
CC: <adilger.kernel@dilger.ca>, <zhangxiaoxu5@huawei.com>,
        <yi.zhang@huawei.com>, <linux-fsdevel@vger.kernel.org>
Subject: [PATCH v3 0/5] ext4: fix inconsistency since async write metadata
 buffer error
Date: Sat, 20 Jun 2020 10:54:22 +0800
Message-ID: <20200620025427.1756360-1-yi.zhang@huawei.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 7BIT
Content-Type: text/plain; charset=US-ASCII
Sender: linux-fsdevel-owner@vger.kernel.org
Precedence: bulk

Series

ext4: fix inconsistency since async write metadata buffer error | expand

Message

Zhang Yi June 20, 2020, 2:54 a.m. UTC

Changes since v2:
 - Christoph against the solution of adding callback in the block layer
   that could let ext4 handle write error. So for simplicity, switch to
   check the bdev mapping->wb_err when ext4 getting journal write access
   as Jan suggested now. Maybe we could implement the callback through
   introduce a special inode (e.g. a meta inode) for ext4 in the future.
 - Patch 1: Add mapping->wb_err check and invoke ext4_error_err() in
   ext4_journal_get_write_access() if wb_err is different from the
   original one saved at mount time.
 - Patch 2-3: Remove partial fix <7963e5ac90125> and <9c83a923c67d>.
 - Patch 4: Fix another inconsistency problem since we may bypass the
   journal's checkpoint procedure if we free metadata buffers which
   were failed to async write out.
 - Patch 5: Just a cleanup patch.
   
The above 5 patches are based on linux-5.8-rc1 and have been tested by
xfstests, no newly increased failures.

Thanks,
Yi.

-----------------------

Original background
===================

This patch set point to fix the inconsistency problem which has been
discussed and partial fixed in [1].

Now, the problem is on the unstable storage which has a flaky transport
(e.g. iSCSI transport may disconnect few seconds and reconnect due to
the bad network environment), if we failed to async write metadata in
background, the end write routine in block layer will clear the buffer's
uptodate flag, but the data in such buffer is actually uptodate. Finally
we may read "old && inconsistent" metadata from the disk when we get the
buffer later because not only the uptodate flag was cleared but also we
do not check the write io error flag, or even worse the buffer has been
freed due to memory presure.

Fortunately, if the jbd2 do checkpoint after async IO error happens,
the checkpoint routine will check the write_io_error flag and abort the
the journal if detect IO error. And in the journal recover case, the
recover code will invoke sync_blockdev() after recover complete, it will
also detect IO error and refuse to mount the filesystem.

Current ext4 have already deal with this problem in __ext4_get_inode_loc()
and commit 7963e5ac90125 ("ext4: treat buffers with write errors as
containing valid data"), but it's not enough.

[1] https://lore.kernel.org/linux-ext4/20190823030207.GC8130@mit.edu/


zhangyi (F) (5):
  ext4: abort the filesystem if failed to async write metadata buffer
  ext4: remove ext4_buffer_uptodate()
  ext4: remove write io error check before read inode block
  jbd2: abort journal if free a async write error metadata buffer
  jbd2: remove unused parameter in jbd2_journal_try_to_free_buffers()

 fs/ext4/ext4.h        | 16 +++-------------
 fs/ext4/ext4_jbd2.c   | 25 +++++++++++++++++++++++++
 fs/ext4/inode.c       | 15 +++------------
 fs/ext4/super.c       | 23 ++++++++++++++++++++---
 fs/jbd2/transaction.c | 20 ++++++++++++++------
 include/linux/jbd2.h  |  2 +-
 6 files changed, 66 insertions(+), 35 deletions(-)

Comments

Zhang Yi July 13, 2020, 1:40 a.m. UTC | #1

Hi, Ted and Jan, what do you think about this solution ?

Thanks,
Yi.

On 2020/6/20 10:54, zhangyi (F) wrote:
> Changes since v2:
>  - Christoph against the solution of adding callback in the block layer
>    that could let ext4 handle write error. So for simplicity, switch to
>    check the bdev mapping->wb_err when ext4 getting journal write access
>    as Jan suggested now. Maybe we could implement the callback through
>    introduce a special inode (e.g. a meta inode) for ext4 in the future.
>  - Patch 1: Add mapping->wb_err check and invoke ext4_error_err() in
>    ext4_journal_get_write_access() if wb_err is different from the
>    original one saved at mount time.
>  - Patch 2-3: Remove partial fix <7963e5ac90125> and <9c83a923c67d>.
>  - Patch 4: Fix another inconsistency problem since we may bypass the
>    journal's checkpoint procedure if we free metadata buffers which
>    were failed to async write out.
>  - Patch 5: Just a cleanup patch.
>    
> The above 5 patches are based on linux-5.8-rc1 and have been tested by
> xfstests, no newly increased failures.
> 
> Thanks,
> Yi.
> 
> -----------------------
> 
> Original background
> ===================
> 
> This patch set point to fix the inconsistency problem which has been
> discussed and partial fixed in [1].
> 
> Now, the problem is on the unstable storage which has a flaky transport
> (e.g. iSCSI transport may disconnect few seconds and reconnect due to
> the bad network environment), if we failed to async write metadata in
> background, the end write routine in block layer will clear the buffer's
> uptodate flag, but the data in such buffer is actually uptodate. Finally
> we may read "old && inconsistent" metadata from the disk when we get the
> buffer later because not only the uptodate flag was cleared but also we
> do not check the write io error flag, or even worse the buffer has been
> freed due to memory presure.
> 
> Fortunately, if the jbd2 do checkpoint after async IO error happens,
> the checkpoint routine will check the write_io_error flag and abort the
> the journal if detect IO error. And in the journal recover case, the
> recover code will invoke sync_blockdev() after recover complete, it will
> also detect IO error and refuse to mount the filesystem.
> 
> Current ext4 have already deal with this problem in __ext4_get_inode_loc()
> and commit 7963e5ac90125 ("ext4: treat buffers with write errors as
> containing valid data"), but it's not enough.
> 
> [1] https://lore.kernel.org/linux-ext4/20190823030207.GC8130@mit.edu/
> 
> 
> zhangyi (F) (5):
>   ext4: abort the filesystem if failed to async write metadata buffer
>   ext4: remove ext4_buffer_uptodate()
>   ext4: remove write io error check before read inode block
>   jbd2: abort journal if free a async write error metadata buffer
>   jbd2: remove unused parameter in jbd2_journal_try_to_free_buffers()
> 
>  fs/ext4/ext4.h        | 16 +++-------------
>  fs/ext4/ext4_jbd2.c   | 25 +++++++++++++++++++++++++
>  fs/ext4/inode.c       | 15 +++------------
>  fs/ext4/super.c       | 23 ++++++++++++++++++++---
>  fs/jbd2/transaction.c | 20 ++++++++++++++------
>  include/linux/jbd2.h  |  2 +-
>  6 files changed, 66 insertions(+), 35 deletions(-)
>