Message ID | 20230105233908.1030651-1-jaegeuk@kernel.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [f2fs-dev] f2fs: retry to update the inode page given EIO | expand |
In f2fs_update_inode_page, f2fs_get_node_page handles EIO along with f2fs_handle_page_eio that stops checkpoint, if the disk couldn't be recovered. As a result, we don't need to stop checkpoint right away given single EIO. Cc: stable@vger.kernel.org Signed-off-by: Randall Huang <huangrandall@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> --- Change log from v1: - fix a bug fs/f2fs/inode.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c index ff6cf66ed46b..2ed7a621fdf1 100644 --- a/fs/f2fs/inode.c +++ b/fs/f2fs/inode.c @@ -719,7 +719,7 @@ void f2fs_update_inode_page(struct inode *inode) if (IS_ERR(node_page)) { int err = PTR_ERR(node_page); - if (err == -ENOMEM) { + if (err == -ENOMEM || (err == -EIO && !f2fs_cp_error(sbi))) { cond_resched(); goto retry; } else if (err != -ENOENT) {
On 2023/1/11 9:20, Jaegeuk Kim wrote: > In f2fs_update_inode_page, f2fs_get_node_page handles EIO along with > f2fs_handle_page_eio that stops checkpoint, if the disk couldn't be recovered. > As a result, we don't need to stop checkpoint right away given single EIO. f2fs_handle_page_eio() only covers the case that EIO occurs on the same page, should we cover the case EIO occurs on different pages? Thanks, > > Cc: stable@vger.kernel.org > Signed-off-by: Randall Huang <huangrandall@google.com> > Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> > --- > > Change log from v1: > - fix a bug > > fs/f2fs/inode.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c > index ff6cf66ed46b..2ed7a621fdf1 100644 > --- a/fs/f2fs/inode.c > +++ b/fs/f2fs/inode.c > @@ -719,7 +719,7 @@ void f2fs_update_inode_page(struct inode *inode) > if (IS_ERR(node_page)) { > int err = PTR_ERR(node_page); > > - if (err == -ENOMEM) { > + if (err == -ENOMEM || (err == -EIO && !f2fs_cp_error(sbi))) { > cond_resched(); > goto retry; > } else if (err != -ENOENT) {
On 01/11, Chao Yu wrote: > On 2023/1/11 9:20, Jaegeuk Kim wrote: > > In f2fs_update_inode_page, f2fs_get_node_page handles EIO along with > > f2fs_handle_page_eio that stops checkpoint, if the disk couldn't be recovered. > > As a result, we don't need to stop checkpoint right away given single EIO. > > f2fs_handle_page_eio() only covers the case that EIO occurs on the same > page, should we cover the case EIO occurs on different pages? Which case are you looking at? > > Thanks, > > > > > Cc: stable@vger.kernel.org > > Signed-off-by: Randall Huang <huangrandall@google.com> > > Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> > > --- > > > > Change log from v1: > > - fix a bug > > > > fs/f2fs/inode.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c > > index ff6cf66ed46b..2ed7a621fdf1 100644 > > --- a/fs/f2fs/inode.c > > +++ b/fs/f2fs/inode.c > > @@ -719,7 +719,7 @@ void f2fs_update_inode_page(struct inode *inode) > > if (IS_ERR(node_page)) { > > int err = PTR_ERR(node_page); > > - if (err == -ENOMEM) { > > + if (err == -ENOMEM || (err == -EIO && !f2fs_cp_error(sbi))) { > > cond_resched(); > > goto retry; > > } else if (err != -ENOENT) {
In f2fs_update_inode_page, f2fs_get_node_page handles EIO along with f2fs_handle_page_eio that stops checkpoint, if the disk couldn't be recovered. As a result, we don't need to stop checkpoint right away given single EIO. Cc: stable@vger.kernel.org Signed-off-by: Randall Huang <huangrandall@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> --- Change log from v2: - set EIO to cover the case of data corruption given by buggy UFS driver fs/f2fs/inode.c | 2 +- fs/f2fs/node.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c index ff6cf66ed46b..2ed7a621fdf1 100644 --- a/fs/f2fs/inode.c +++ b/fs/f2fs/inode.c @@ -719,7 +719,7 @@ void f2fs_update_inode_page(struct inode *inode) if (IS_ERR(node_page)) { int err = PTR_ERR(node_page); - if (err == -ENOMEM) { + if (err == -ENOMEM || (err == -EIO && !f2fs_cp_error(sbi))) { cond_resched(); goto retry; } else if (err != -ENOENT) { diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c index 558b318f7316..1629dc300c61 100644 --- a/fs/f2fs/node.c +++ b/fs/f2fs/node.c @@ -1455,7 +1455,7 @@ static struct page *__get_node_page(struct f2fs_sb_info *sbi, pgoff_t nid, ofs_of_node(page), cpver_of_node(page), next_blkaddr_of_node(page)); set_sbi_flag(sbi, SBI_NEED_FSCK); - err = -EINVAL; + err = -EIO; out_err: ClearPageUptodate(page); out_put_err:
On 2023/1/12 2:50, Jaegeuk Kim wrote: > On 01/11, Chao Yu wrote: >> On 2023/1/11 9:20, Jaegeuk Kim wrote: >>> In f2fs_update_inode_page, f2fs_get_node_page handles EIO along with >>> f2fs_handle_page_eio that stops checkpoint, if the disk couldn't be recovered. >>> As a result, we don't need to stop checkpoint right away given single EIO. >> >> f2fs_handle_page_eio() only covers the case that EIO occurs on the same >> page, should we cover the case EIO occurs on different pages? > > Which case are you looking at? - __get_node_page(PageA) - __get_node_page(PageB) - f2fs_handle_page_eio - sbi->page_eio_ofs[type] = PageA->index - f2fs_handle_page_eio - sbi->page_eio_ofs[type] = PageB->index In such race case, it may has low probability to set CP_ERROR_FLAG as we expect? Thanks, > >> >> Thanks, >> >>> >>> Cc: stable@vger.kernel.org >>> Signed-off-by: Randall Huang <huangrandall@google.com> >>> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> >>> --- >>> >>> Change log from v1: >>> - fix a bug >>> >>> fs/f2fs/inode.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c >>> index ff6cf66ed46b..2ed7a621fdf1 100644 >>> --- a/fs/f2fs/inode.c >>> +++ b/fs/f2fs/inode.c >>> @@ -719,7 +719,7 @@ void f2fs_update_inode_page(struct inode *inode) >>> if (IS_ERR(node_page)) { >>> int err = PTR_ERR(node_page); >>> - if (err == -ENOMEM) { >>> + if (err == -ENOMEM || (err == -EIO && !f2fs_cp_error(sbi))) { >>> cond_resched(); >>> goto retry; >>> } else if (err != -ENOENT) {
On 01/12, Chao Yu wrote: > On 2023/1/12 2:50, Jaegeuk Kim wrote: > > On 01/11, Chao Yu wrote: > > > On 2023/1/11 9:20, Jaegeuk Kim wrote: > > > > In f2fs_update_inode_page, f2fs_get_node_page handles EIO along with > > > > f2fs_handle_page_eio that stops checkpoint, if the disk couldn't be recovered. > > > > As a result, we don't need to stop checkpoint right away given single EIO. > > > > > > f2fs_handle_page_eio() only covers the case that EIO occurs on the same > > > page, should we cover the case EIO occurs on different pages? > > > > Which case are you looking at? > > - __get_node_page(PageA) - __get_node_page(PageB) > - f2fs_handle_page_eio > - sbi->page_eio_ofs[type] = PageA->index > - f2fs_handle_page_eio > - sbi->page_eio_ofs[type] = PageB->index > > In such race case, it may has low probability to set CP_ERROR_FLAG as we expect? Do you see that case in products? I'm trying to avoid setting CP_ERROR_FLAG here. > > Thanks, > > > > > > > > > Thanks, > > > > > > > > > > > Cc: stable@vger.kernel.org > > > > Signed-off-by: Randall Huang <huangrandall@google.com> > > > > Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> > > > > --- > > > > > > > > Change log from v1: > > > > - fix a bug > > > > > > > > fs/f2fs/inode.c | 2 +- > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c > > > > index ff6cf66ed46b..2ed7a621fdf1 100644 > > > > --- a/fs/f2fs/inode.c > > > > +++ b/fs/f2fs/inode.c > > > > @@ -719,7 +719,7 @@ void f2fs_update_inode_page(struct inode *inode) > > > > if (IS_ERR(node_page)) { > > > > int err = PTR_ERR(node_page); > > > > - if (err == -ENOMEM) { > > > > + if (err == -ENOMEM || (err == -EIO && !f2fs_cp_error(sbi))) { > > > > cond_resched(); > > > > goto retry; > > > > } else if (err != -ENOENT) {
On 2023/1/13 8:01, Jaegeuk Kim wrote: > On 01/12, Chao Yu wrote: >> On 2023/1/12 2:50, Jaegeuk Kim wrote: >>> On 01/11, Chao Yu wrote: >>>> On 2023/1/11 9:20, Jaegeuk Kim wrote: >>>>> In f2fs_update_inode_page, f2fs_get_node_page handles EIO along with >>>>> f2fs_handle_page_eio that stops checkpoint, if the disk couldn't be recovered. >>>>> As a result, we don't need to stop checkpoint right away given single EIO. >>>> >>>> f2fs_handle_page_eio() only covers the case that EIO occurs on the same >>>> page, should we cover the case EIO occurs on different pages? >>> >>> Which case are you looking at? >> >> - __get_node_page(PageA) - __get_node_page(PageB) >> - f2fs_handle_page_eio >> - sbi->page_eio_ofs[type] = PageA->index >> - f2fs_handle_page_eio >> - sbi->page_eio_ofs[type] = PageB->index >> >> In such race case, it may has low probability to set CP_ERROR_FLAG as we expect? > > Do you see that case in products? Not yet. > I'm trying to avoid setting CP_ERROR_FLAG here. Copied, how about using the same logic for node page as meta page, like we did in f2fs_get_meta_page_retry()? Thanks, > >> >> Thanks, >> >>> >>>> >>>> Thanks, >>>> >>>>> >>>>> Cc: stable@vger.kernel.org >>>>> Signed-off-by: Randall Huang <huangrandall@google.com> >>>>> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> >>>>> --- >>>>> >>>>> Change log from v1: >>>>> - fix a bug >>>>> >>>>> fs/f2fs/inode.c | 2 +- >>>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>>> >>>>> diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c >>>>> index ff6cf66ed46b..2ed7a621fdf1 100644 >>>>> --- a/fs/f2fs/inode.c >>>>> +++ b/fs/f2fs/inode.c >>>>> @@ -719,7 +719,7 @@ void f2fs_update_inode_page(struct inode *inode) >>>>> if (IS_ERR(node_page)) { >>>>> int err = PTR_ERR(node_page); >>>>> - if (err == -ENOMEM) { >>>>> + if (err == -ENOMEM || (err == -EIO && !f2fs_cp_error(sbi))) { >>>>> cond_resched(); >>>>> goto retry; >>>>> } else if (err != -ENOENT) {
If the storage gives a corrupted node block due to short power failure and
reset, f2fs stops the entire operations by setting the checkpoint failure flag.
Let's give more chances to live by re-issuing IOs for a while in such critical
path.
Cc: stable@vger.kernel.org
Suggested-by: Randall Huang <huangrandall@google.com>
Suggested-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
---
Change log from v2:
- adopting the retry logic
fs/f2fs/inode.c | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
index e1d6e021e82b..7bf660d4cad9 100644
--- a/fs/f2fs/inode.c
+++ b/fs/f2fs/inode.c
@@ -724,18 +724,19 @@ void f2fs_update_inode_page(struct inode *inode)
{
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
struct page *node_page;
+ int count = 0;
retry:
node_page = f2fs_get_node_page(sbi, inode->i_ino);
if (IS_ERR(node_page)) {
int err = PTR_ERR(node_page);
- if (err == -ENOMEM) {
- cond_resched();
+ /* The node block was truncated. */
+ if (err == -ENOENT)
+ return;
+
+ if (err == -ENOMEM || ++count <= DEFAULT_RETRY_IO_COUNT)
goto retry;
- } else if (err != -ENOENT) {
- f2fs_stop_checkpoint(sbi, false,
- STOP_CP_REASON_UPDATE_INODE);
- }
+ f2fs_stop_checkpoint(sbi, false, STOP_CP_REASON_UPDATE_INODE);
return;
}
f2fs_update_inode(inode, node_page);
On 01/28, Chao Yu wrote: > On 2023/1/13 8:01, Jaegeuk Kim wrote: > > On 01/12, Chao Yu wrote: > > > On 2023/1/12 2:50, Jaegeuk Kim wrote: > > > > On 01/11, Chao Yu wrote: > > > > > On 2023/1/11 9:20, Jaegeuk Kim wrote: > > > > > > In f2fs_update_inode_page, f2fs_get_node_page handles EIO along with > > > > > > f2fs_handle_page_eio that stops checkpoint, if the disk couldn't be recovered. > > > > > > As a result, we don't need to stop checkpoint right away given single EIO. > > > > > > > > > > f2fs_handle_page_eio() only covers the case that EIO occurs on the same > > > > > page, should we cover the case EIO occurs on different pages? > > > > > > > > Which case are you looking at? > > > > > > - __get_node_page(PageA) - __get_node_page(PageB) > > > - f2fs_handle_page_eio > > > - sbi->page_eio_ofs[type] = PageA->index > > > - f2fs_handle_page_eio > > > - sbi->page_eio_ofs[type] = PageB->index > > > > > > In such race case, it may has low probability to set CP_ERROR_FLAG as we expect? > > > > Do you see that case in products? > > Not yet. > > > I'm trying to avoid setting CP_ERROR_FLAG here. > > Copied, how about using the same logic for node page as meta page, like > we did in f2fs_get_meta_page_retry()? Yeah.. let me send v3 and test a bit. Thanks~ :) > > Thanks, > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > Cc: stable@vger.kernel.org > > > > > > Signed-off-by: Randall Huang <huangrandall@google.com> > > > > > > Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> > > > > > > --- > > > > > > > > > > > > Change log from v1: > > > > > > - fix a bug > > > > > > > > > > > > fs/f2fs/inode.c | 2 +- > > > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > > > > > diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c > > > > > > index ff6cf66ed46b..2ed7a621fdf1 100644 > > > > > > --- a/fs/f2fs/inode.c > > > > > > +++ b/fs/f2fs/inode.c > > > > > > @@ -719,7 +719,7 @@ void f2fs_update_inode_page(struct inode *inode) > > > > > > if (IS_ERR(node_page)) { > > > > > > int err = PTR_ERR(node_page); > > > > > > - if (err == -ENOMEM) { > > > > > > + if (err == -ENOMEM || (err == -EIO && !f2fs_cp_error(sbi))) { > > > > > > cond_resched(); > > > > > > goto retry; > > > > > > } else if (err != -ENOENT) {
On 2023/1/31 7:30, Jaegeuk Kim wrote: > If the storage gives a corrupted node block due to short power failure and > reset, f2fs stops the entire operations by setting the checkpoint failure flag. > > Let's give more chances to live by re-issuing IOs for a while in such critical > path. > > Cc: stable@vger.kernel.org > Suggested-by: Randall Huang <huangrandall@google.com> > Suggested-by: Chao Yu <chao@kernel.org> > Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Reviewed-by: Chao Yu <chao@kernel.org> Thanks,
diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c index 01b9e6f85f6b..66e407fcefd3 100644 --- a/fs/f2fs/inode.c +++ b/fs/f2fs/inode.c @@ -719,10 +719,10 @@ void f2fs_update_inode_page(struct inode *inode) if (IS_ERR(node_page)) { int err = PTR_ERR(node_page); - if (err == -ENOMEM) { + if (err == -ENOMEM || err == -EIO) { cond_resched(); goto retry; - } else if (err != -ENOENT) { + } else if (err != -ENOENT || f2fs_cp_error(sbi)) { f2fs_stop_checkpoint(sbi, false, STOP_CP_REASON_UPDATE_INODE); }