Message ID | 1543200508-6838-7-git-send-email-jsimmons@infradead.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | lustre: new patches to address previous reviews | expand |
On Sun, Nov 25 2018, James Simmons wrote: > From: Lai Siyao <lai.siyao@whamcloud.com> > > Reading directory pages may fail on MDS, in this case client should > not cache a non-up-to-date directory page, because it will cause > a later read on the same page fail. > > Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com> > WC-bug-id: https://jira.whamcloud.com/browse/LU-5461 > Reviewed-on: http://review.whamcloud.com/11450 > Reviewed-by: Fan Yong <fan.yong@intel.com> > Reviewed-by: John L. Hammond <jhammond@whamcloud.com> > Reviewed-by: Oleg Drokin <green@whamcloud.com> > Signed-off-by: James Simmons <jsimmons@infradead.org> > --- > drivers/staging/lustre/lustre/mdc/mdc_request.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/drivers/staging/lustre/lustre/mdc/mdc_request.c b/drivers/staging/lustre/lustre/mdc/mdc_request.c > index 1072b66..09b30ef 100644 > --- a/drivers/staging/lustre/lustre/mdc/mdc_request.c > +++ b/drivers/staging/lustre/lustre/mdc/mdc_request.c > @@ -1212,7 +1212,10 @@ static int mdc_read_page_remote(void *data, struct page *page0) > } > > rc = mdc_getpage(rp->rp_exp, fid, rp->rp_off, page_pool, npages, &req); > - if (!rc) { > + if (rc < 0) { > + /* page0 is special, which was added into page cache early */ > + delete_from_page_cache(page0); This looks wrong to me. We shouldn't need to delete the page from the page-cache. It won't be marked up-to-date, so why will that cause an error on a later read??? Well, because mdc_page_locate() finds a page and if it isn't up-to-date, it returns -EIO. Why does it do that? If it found a PageError() page, then it might be reasonable to return -EIO. Why not just return the page that was found, and let the caller check if it is Uptodate? Well, because mdc_read_page() handles a successful page return from mdc_page_locate() as a hash collision. I guess I need to understand how lustre maps a hash to a directory block, and then how it handles collisions... The reason this jumped out at me is that it looks like it might be racy. Adding a page and then removing it might leave a window where some other thread can find the page. That is not a problem is a non-up-to-date page just means we should wait for it. But if it can cause an error, then maybe the race is a real problem. But maybe there is some higher level locking... NeilBrown
You're right, I'll fix it later. Thanks, Lai On 2018/11/27, 11:01 AM, "NeilBrown" <neilb@suse.com> wrote: On Sun, Nov 25 2018, James Simmons wrote: > From: Lai Siyao <lai.siyao@whamcloud.com> > > Reading directory pages may fail on MDS, in this case client should > not cache a non-up-to-date directory page, because it will cause > a later read on the same page fail. > > Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com> > WC-bug-id: https://jira.whamcloud.com/browse/LU-5461 > Reviewed-on: http://review.whamcloud.com/11450 > Reviewed-by: Fan Yong <fan.yong@intel.com> > Reviewed-by: John L. Hammond <jhammond@whamcloud.com> > Reviewed-by: Oleg Drokin <green@whamcloud.com> > Signed-off-by: James Simmons <jsimmons@infradead.org> > --- > drivers/staging/lustre/lustre/mdc/mdc_request.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/drivers/staging/lustre/lustre/mdc/mdc_request.c b/drivers/staging/lustre/lustre/mdc/mdc_request.c > index 1072b66..09b30ef 100644 > --- a/drivers/staging/lustre/lustre/mdc/mdc_request.c > +++ b/drivers/staging/lustre/lustre/mdc/mdc_request.c > @@ -1212,7 +1212,10 @@ static int mdc_read_page_remote(void *data, struct page *page0) > } > > rc = mdc_getpage(rp->rp_exp, fid, rp->rp_off, page_pool, npages, &req); > - if (!rc) { > + if (rc < 0) { > + /* page0 is special, which was added into page cache early */ > + delete_from_page_cache(page0); This looks wrong to me. We shouldn't need to delete the page from the page-cache. It won't be marked up-to-date, so why will that cause an error on a later read??? Well, because mdc_page_locate() finds a page and if it isn't up-to-date, it returns -EIO. Why does it do that? If it found a PageError() page, then it might be reasonable to return -EIO. Why not just return the page that was found, and let the caller check if it is Uptodate? Well, because mdc_read_page() handles a successful page return from mdc_page_locate() as a hash collision. I guess I need to understand how lustre maps a hash to a directory block, and then how it handles collisions... The reason this jumped out at me is that it looks like it might be racy. Adding a page and then removing it might leave a window where some other thread can find the page. That is not a problem is a non-up-to-date page just means we should wait for it. But if it can cause an error, then maybe the race is a real problem. But maybe there is some higher level locking... NeilBrown
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_request.c b/drivers/staging/lustre/lustre/mdc/mdc_request.c index 1072b66..09b30ef 100644 --- a/drivers/staging/lustre/lustre/mdc/mdc_request.c +++ b/drivers/staging/lustre/lustre/mdc/mdc_request.c @@ -1212,7 +1212,10 @@ static int mdc_read_page_remote(void *data, struct page *page0) } rc = mdc_getpage(rp->rp_exp, fid, rp->rp_off, page_pool, npages, &req); - if (!rc) { + if (rc < 0) { + /* page0 is special, which was added into page cache early */ + delete_from_page_cache(page0); + } else { int lu_pgs = req->rq_bulk->bd_nob_transferred; rd_pgs = (lu_pgs + PAGE_SIZE - 1) >> PAGE_SHIFT;