Message ID | 1375357892-10188-1-git-send-email-handai.szj@taobao.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, 1 Aug 2013, Yan, Zheng wrote: > On Thu, Aug 1, 2013 at 7:51 PM, Sha Zhengju <handai.szj@gmail.com> wrote: > > From: Sha Zhengju <handai.szj@taobao.com> > > > > Following we will begin to add memcg dirty page accounting around > __set_page_dirty_ > > {buffers,nobuffers} in vfs layer, so we'd better use vfs interface to > avoid exporting > > those details to filesystems. > > > > Signed-off-by: Sha Zhengju <handai.szj@taobao.com> > > --- > > fs/ceph/addr.c | 13 +------------ > > 1 file changed, 1 insertion(+), 12 deletions(-) > > > > diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c > > index 3e68ac1..1445bf1 100644 > > --- a/fs/ceph/addr.c > > +++ b/fs/ceph/addr.c > > @@ -76,7 +76,7 @@ static int ceph_set_page_dirty(struct page *page) > > if (unlikely(!mapping)) > > return !TestSetPageDirty(page); > > > > - if (TestSetPageDirty(page)) { > > + if (!__set_page_dirty_nobuffers(page)) { > it's too early to set the radix tree tag here. We should set page's snapshot > context and increase the i_wrbuffer_ref first. This is because once the tag > is set, writeback thread can find and start flushing the page. Unfortunately I only remember being frustrated by this code. :) Looking at it now, though, it seems like the minimum fix is to set the page->private before marking the page dirty. I don't know the locking rules around that, though. If that is potentially racy, maybe the safest thing would be if __set_page_dirty_nobuffers() took a void* to set page->private to atomically while holding the tree_lock. sage > > > dout("%p set_page_dirty %p idx %lu -- already dirty\n", > > mapping->host, page, page->index); > > return 0; > > @@ -107,14 +107,7 @@ static int ceph_set_page_dirty(struct page *page) > > snapc, snapc->seq, snapc->num_snaps); > > spin_unlock(&ci->i_ceph_lock); > > > > - /* now adjust page */ > > - spin_lock_irq(&mapping->tree_lock); > > if (page->mapping) { /* Race with truncate? */ > > - WARN_ON_ONCE(!PageUptodate(page)); > > - account_page_dirtied(page, page->mapping); > > - radix_tree_tag_set(&mapping->page_tree, > > - page_index(page), PAGECACHE_TAG_DIRTY); > > - > > this code was coped from __set_page_dirty_nobuffers(). I think the reason > Sage did this is to handle the race described in > __set_page_dirty_nobuffers()'s comment. But I'm wonder if "page->mapping == > NULL" can still happen here. Because truncate_inode_page() unmap page from > processes's address spaces first, then delete page from page cache. > > Regards > Yan, Zheng > > > /* > > * Reference snap context in page->private. Also set > > * PagePrivate so that we get invalidatepage callback. > > @@ -126,14 +119,10 @@ static int ceph_set_page_dirty(struct page *page) > > undo = 1; > > } > > > > - spin_unlock_irq(&mapping->tree_lock); > > > > > > - > > if (undo) > > /* whoops, we failed to dirty the page */ > > ceph_put_wrbuffer_cap_refs(ci, 1, snapc); > > > > - __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); > > - > > BUG_ON(!PageDirty(page)); > > return 1; > > } > > -- > > 1.7.9.5 > > > > -- > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > >
On Thu, Aug 1, 2013 at 11:19 PM, Yan, Zheng <ukernel@gmail.com> wrote: > On Thu, Aug 1, 2013 at 7:51 PM, Sha Zhengju <handai.szj@gmail.com> wrote: >> From: Sha Zhengju <handai.szj@taobao.com> >> >> Following we will begin to add memcg dirty page accounting around >> __set_page_dirty_ >> {buffers,nobuffers} in vfs layer, so we'd better use vfs interface to >> avoid exporting >> those details to filesystems. >> >> Signed-off-by: Sha Zhengju <handai.szj@taobao.com> >> --- >> fs/ceph/addr.c | 13 +------------ >> 1 file changed, 1 insertion(+), 12 deletions(-) >> >> diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c >> index 3e68ac1..1445bf1 100644 >> --- a/fs/ceph/addr.c >> +++ b/fs/ceph/addr.c >> @@ -76,7 +76,7 @@ static int ceph_set_page_dirty(struct page *page) >> if (unlikely(!mapping)) >> return !TestSetPageDirty(page); >> >> - if (TestSetPageDirty(page)) { >> + if (!__set_page_dirty_nobuffers(page)) { > > it's too early to set the radix tree tag here. We should set page's snapshot > context and increase the i_wrbuffer_ref first. This is because once the tag > is set, writeback thread can find and start flushing the page. OK, thanks for pointing it out. > > >> dout("%p set_page_dirty %p idx %lu -- already dirty\n", >> mapping->host, page, page->index); >> return 0; >> @@ -107,14 +107,7 @@ static int ceph_set_page_dirty(struct page *page) >> snapc, snapc->seq, snapc->num_snaps); >> spin_unlock(&ci->i_ceph_lock); >> >> - /* now adjust page */ >> - spin_lock_irq(&mapping->tree_lock); >> if (page->mapping) { /* Race with truncate? */ >> - WARN_ON_ONCE(!PageUptodate(page)); >> - account_page_dirtied(page, page->mapping); >> - radix_tree_tag_set(&mapping->page_tree, >> - page_index(page), PAGECACHE_TAG_DIRTY); >> - > > this code was coped from __set_page_dirty_nobuffers(). I think the reason > Sage did this is to handle the race described in > __set_page_dirty_nobuffers()'s comment. But I'm wonder if "page->mapping == > NULL" can still happen here. Because truncate_inode_page() unmap page from > processes's address spaces first, then delete page from page cache. But in non-mmap case, doesn't it has no relation to 'unmap page from address spaces'? The check is exactly avoiding racy with delete_from_page_cache(), since the two both need to hold mapping->tree_lock, and if truncate goes first then __set_page_dirty_nobuffers() may have NULL mapping. Thanks, Sha -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Aug 2, 2013 at 2:27 AM, Sage Weil <sage@inktank.com> wrote: > On Thu, 1 Aug 2013, Yan, Zheng wrote: >> On Thu, Aug 1, 2013 at 7:51 PM, Sha Zhengju <handai.szj@gmail.com> wrote: >> > From: Sha Zhengju <handai.szj@taobao.com> >> > >> > Following we will begin to add memcg dirty page accounting around >> __set_page_dirty_ >> > {buffers,nobuffers} in vfs layer, so we'd better use vfs interface to >> avoid exporting >> > those details to filesystems. >> > >> > Signed-off-by: Sha Zhengju <handai.szj@taobao.com> >> > --- >> > fs/ceph/addr.c | 13 +------------ >> > 1 file changed, 1 insertion(+), 12 deletions(-) >> > >> > diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c >> > index 3e68ac1..1445bf1 100644 >> > --- a/fs/ceph/addr.c >> > +++ b/fs/ceph/addr.c >> > @@ -76,7 +76,7 @@ static int ceph_set_page_dirty(struct page *page) >> > if (unlikely(!mapping)) >> > return !TestSetPageDirty(page); >> > >> > - if (TestSetPageDirty(page)) { >> > + if (!__set_page_dirty_nobuffers(page)) { >> it's too early to set the radix tree tag here. We should set page's snapshot >> context and increase the i_wrbuffer_ref first. This is because once the tag >> is set, writeback thread can find and start flushing the page. > > Unfortunately I only remember being frustrated by this code. :) Looking > at it now, though, it seems like the minimum fix is to set the > page->private before marking the page dirty. I don't know the locking > rules around that, though. If that is potentially racy, maybe the safest > thing would be if __set_page_dirty_nobuffers() took a void* to set > page->private to atomically while holding the tree_lock. > Sorry, I don't catch the point of your last sentence... Could you please explain it again? I notice there is a check in __set_page_dirty_nobuffers(): WARN_ON_ONCE(!PagePrivate(page) && !PageUptodate(page)); So does it mean we can only set page->private after it? but if so the __mark_inode_dirty is still ahead of setting snapc. Thanks, Sha -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Aug 2, 2013 at 5:04 PM, Sha Zhengju <handai.szj@gmail.com> wrote: > > On Thu, Aug 1, 2013 at 11:19 PM, Yan, Zheng <ukernel@gmail.com> wrote: > > On Thu, Aug 1, 2013 at 7:51 PM, Sha Zhengju <handai.szj@gmail.com> wrote: > >> From: Sha Zhengju <handai.szj@taobao.com> > >> > >> Following we will begin to add memcg dirty page accounting around > >> __set_page_dirty_ > >> {buffers,nobuffers} in vfs layer, so we'd better use vfs interface to > >> avoid exporting > >> those details to filesystems. > >> > >> Signed-off-by: Sha Zhengju <handai.szj@taobao.com> > >> --- > >> fs/ceph/addr.c | 13 +------------ > >> 1 file changed, 1 insertion(+), 12 deletions(-) > >> > >> diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c > >> index 3e68ac1..1445bf1 100644 > >> --- a/fs/ceph/addr.c > >> +++ b/fs/ceph/addr.c > >> @@ -76,7 +76,7 @@ static int ceph_set_page_dirty(struct page *page) > >> if (unlikely(!mapping)) > >> return !TestSetPageDirty(page); > >> > >> - if (TestSetPageDirty(page)) { > >> + if (!__set_page_dirty_nobuffers(page)) { > > > > it's too early to set the radix tree tag here. We should set page's snapshot > > context and increase the i_wrbuffer_ref first. This is because once the tag > > is set, writeback thread can find and start flushing the page. > > OK, thanks for pointing it out. > > > > > > >> dout("%p set_page_dirty %p idx %lu -- already dirty\n", > >> mapping->host, page, page->index); > >> return 0; > >> @@ -107,14 +107,7 @@ static int ceph_set_page_dirty(struct page *page) > >> snapc, snapc->seq, snapc->num_snaps); > >> spin_unlock(&ci->i_ceph_lock); > >> > >> - /* now adjust page */ > >> - spin_lock_irq(&mapping->tree_lock); > >> if (page->mapping) { /* Race with truncate? */ > >> - WARN_ON_ONCE(!PageUptodate(page)); > >> - account_page_dirtied(page, page->mapping); > >> - radix_tree_tag_set(&mapping->page_tree, > >> - page_index(page), PAGECACHE_TAG_DIRTY); > >> - > > > > this code was coped from __set_page_dirty_nobuffers(). I think the reason > > Sage did this is to handle the race described in > > __set_page_dirty_nobuffers()'s comment. But I'm wonder if "page->mapping == > > NULL" can still happen here. Because truncate_inode_page() unmap page from > > processes's address spaces first, then delete page from page cache. > > But in non-mmap case, doesn't it has no relation to 'unmap page from > address spaces'? In non-mmap case, page is locked when the set_page_dirty() callback is called. truncate_inode_page() waits until the page is unlocked, then delete it from the page cache. Regards Yan, Zheng > The check is exactly avoiding racy with delete_from_page_cache(), > since the two both need to hold mapping->tree_lock, and if truncate > goes first then __set_page_dirty_nobuffers() may have NULL mapping. > > > Thanks, > Sha -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c index 3e68ac1..1445bf1 100644 --- a/fs/ceph/addr.c +++ b/fs/ceph/addr.c @@ -76,7 +76,7 @@ static int ceph_set_page_dirty(struct page *page) if (unlikely(!mapping)) return !TestSetPageDirty(page); - if (TestSetPageDirty(page)) { + if (!__set_page_dirty_nobuffers(page)) { dout("%p set_page_dirty %p idx %lu -- already dirty\n", mapping->host, page, page->index); return 0; @@ -107,14 +107,7 @@ static int ceph_set_page_dirty(struct page *page) snapc, snapc->seq, snapc->num_snaps); spin_unlock(&ci->i_ceph_lock); - /* now adjust page */ - spin_lock_irq(&mapping->tree_lock); if (page->mapping) { /* Race with truncate? */ - WARN_ON_ONCE(!PageUptodate(page)); - account_page_dirtied(page, page->mapping); - radix_tree_tag_set(&mapping->page_tree, - page_index(page), PAGECACHE_TAG_DIRTY); - /* * Reference snap context in page->private. Also set * PagePrivate so that we get invalidatepage callback. @@ -126,14 +119,10 @@ static int ceph_set_page_dirty(struct page *page) undo = 1; } - spin_unlock_irq(&mapping->tree_lock); - if (undo) /* whoops, we failed to dirty the page */ ceph_put_wrbuffer_cap_refs(ci, 1, snapc); - __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); - BUG_ON(!PageDirty(page)); return 1; }