Message ID | CAJMB+Ng3MPHfMXcPvTU0udxX7_Z5+pFVhWVQuP-yzecA-QqV8w@mail.gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 18 June 2015 at 04:51, Hin-Tak Leung <hintak.leung@gmail.com> wrote: > Hi Andrew and everybody else, > > I looked through the pre-git history and seem to have found the reason of how > the current code had come to be, and why Sergei's fix seems to involve > re-enable "#if 0"'ed out code. It is this - I think the code was wrong on entry, > in that the first !REF_PAGES should be REF_PAGES. Then the "remove old > debug code" > commit below did not remove the right stuff. > > Looking at the old code, I think REF_PAGES may have started out being > 1 (i.e. pages are released right after create, then get/put when needed, no > need to release on bnode_free), > then the code was modified for efficiency so that pages are not released > on bnode_create and not put/get at bnode_put/get but release at bnode_free. > But it was still largely working in the REF_PAGE=1 mode (and with the commented > out release at bnode_free, which is supposed to be spanned by a !REF_PAGE). > Then it was flipped over, and seems to work, and things were forgotten. > > I think the release at bnode_free was commented out because the person who > wrote it wasn't sure - I am not sure either, since it seems like there might be > other/more places that one might need to release the pages than just > at bnode_free(). > > Does this train of thought make sense? No. The occurrence of !REF_PAGES was not a typo, it seems to me. Well, at first it was my thought too. But later I considered what the logic of the author could have been with REF_PAGES=0 and REF_PAGES=1 and came to a conclusion that REF_PAGES=1 was a safe approach and REF_PAGES=0 was an experimental unsafe release-early approach, so !REF_PAGES was what he really meant. > > Hin-Tak > > <commit> > commit a5e3985fa014029eb6795664c704953720cc7f7d > Author: Roman Zippel <zippel@linux-m68k.org> > Date: Tue Sep 6 15:18:47 2005 -0700 > > [PATCH] hfs: remove debug code > > This removes some old debug code, which is no longer needed. > > Signed-off-by: Roman Zippel <zippel@linux-m68k.org> > Signed-off-by: Andrew Morton <akpm@osdl.org> > Signed-off-by: Linus Torvalds <torvalds@osdl.org> > > diff --git a/fs/hfs/bnode.c b/fs/hfs/bnode.c > index a096c5a..3d5cdc6 100644 > --- a/fs/hfs/bnode.c > +++ b/fs/hfs/bnode.c > @@ -13,8 +13,6 @@ > > #include "btree.h" > > -#define REF_PAGES 0 > - > void hfs_bnode_read(struct hfs_bnode *node, void *buf, > int off, int len) > { > @@ -289,9 +287,7 @@ static struct hfs_bnode *__hfs_bnode_create(struct > hfs_btree *tree, u32 cnid) > page_cache_release(page); > goto fail; > } > -#if !REF_PAGES > page_cache_release(page); > -#endif > node->page[i] = page; > } > > @@ -449,13 +445,6 @@ void hfs_bnode_get(struct hfs_bnode *node) > { > if (node) { > atomic_inc(&node->refcnt); > -#if REF_PAGES > - { > - int i; > - for (i = 0; i < node->tree->pages_per_bnode; i++) > - get_page(node->page[i]); > - } > -#endif > dprint(DBG_BNODE_REFS, "get_node(%d:%d): %d\n", > node->tree->cnid, node->this, atomic_read(&node->refcnt)); > } > @@ -472,20 +461,12 @@ void hfs_bnode_put(struct hfs_bnode *node) > node->tree->cnid, node->this, atomic_read(&node->refcnt)); > if (!atomic_read(&node->refcnt)) > BUG(); > - if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) { > -#if REF_PAGES > - for (i = 0; i < tree->pages_per_bnode; i++) > - put_page(node->page[i]); > -#endif > + if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) > return; > - } > for (i = 0; i < tree->pages_per_bnode; i++) { > if (!node->page[i]) > continue; > mark_page_accessed(node->page[i]); > -#if REF_PAGES > - put_page(node->page[i]); > -#endif > } > > if (test_bit(HFS_BNODE_DELETED, &node->flags)) { > diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c > index 8868d3b..b85abc6 100644 > --- a/fs/hfsplus/bnode.c > +++ b/fs/hfsplus/bnode.c > @@ -18,8 +18,6 @@ > #include "hfsplus_fs.h" > #include "hfsplus_raw.h" > > -#define REF_PAGES 0 > - > /* Copy a specified range of bytes from the raw data of a node */ > void hfs_bnode_read(struct hfs_bnode *node, void *buf, int off, int len) > { > @@ -450,9 +448,7 @@ static struct hfs_bnode *__hfs_bnode_create(struct > hfs_btree *tree, u32 cnid) > page_cache_release(page); > goto fail; > } > -#if !REF_PAGES > page_cache_release(page); > -#endif > node->page[i] = page; > } > > @@ -612,13 +608,6 @@ void hfs_bnode_get(struct hfs_bnode *node) > { > if (node) { > atomic_inc(&node->refcnt); > -#if REF_PAGES > - { > - int i; > - for (i = 0; i < node->tree->pages_per_bnode; i++) > - get_page(node->page[i]); > - } > -#endif > dprint(DBG_BNODE_REFS, "get_node(%d:%d): %d\n", > node->tree->cnid, node->this, atomic_read(&node->refcnt)); > } > @@ -635,20 +624,12 @@ void hfs_bnode_put(struct hfs_bnode *node) > node->tree->cnid, node->this, atomic_read(&node->refcnt)); > if (!atomic_read(&node->refcnt)) > BUG(); > - if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) { > -#if REF_PAGES > - for (i = 0; i < tree->pages_per_bnode; i++) > - put_page(node->page[i]); > -#endif > + if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) > return; > - } > for (i = 0; i < tree->pages_per_bnode; i++) { > if (!node->page[i]) > continue; > mark_page_accessed(node->page[i]); > -#if REF_PAGES > - put_page(node->page[i]); > -#endif > } > > if (test_bit(HFS_BNODE_DELETED, &node->flags)) { > </commit> > > On 18 June 2015 at 00:41, Hin-Tak Leung <htl10@users.sourceforge.net> wrote: >> ------------------------------ >> On Tue, Jun 9, 2015 7:06 PM BST Anton Altaparmakov wrote: >> >>>Hi Andrew, >>> >>>Can you please take this patch up and get it merged into mainline? Despite Vyacheslav's lamentations this patch is obviously correct. __hfs_bnode_create() is called in two places from the driver and the pages it returned are kmap()-ed and used (both read and write) so it is quite obvious they cannot be page_cache_release()-d until after they are kunmap()-ed instead of before they are even kmap()-ed as happens without this patch and this patch fixes exactly this by moving the page_cache_release() calls to after the pages are kunmap()-ed. >>> >>>Feel free to add: >>> >>>Reviewed-by: Anton Altaparmakov <anton@tuxera.com> >>> >>>Thanks a lot in advance! >>> >>>Best regards, >>> >>> Anton >> >> I went around and looked at the code and Anton is right - __hfs_bnode_create() is called from hfs_bnode_create() and bfs_bnode_find(), and the pages are played with afterwards a bit further, so page_cache_release() at where the current location is, is wrong. >> >> Since I have a way of seeing a likely-related problem, I'll try testing this change in the next few days and possibly adding a Signed-off at the point. >> >> BTW, there is a similiar problem with the hfs (no plus) driver also; the "if 0" part is commented out with // c++ style comments instead. >> I'm looking over the pre-git history to see how that comes about. >> >> Hin-Tak >> >>>> On 7 Jun 2015, at 03:42, Sergei Antonov <saproj@gmail.com> wrote: >>>> >>>> Fix this bugreport by Sasha Levin: >>>> http://lkml.org/lkml/2015/2/20/85 ("use after free") >>>> Make sure mapped pages are available for the entire lifetime of hfs_bnode. >>>> >>>> Cc: Al Viro <viro@zeniv.linux.org.uk> >>>> Cc: Christoph Hellwig <hch@infradead.org> >>>> Cc: Andrew Morton <akpm@linux-foundation.org> >>>> Cc: Vyacheslav Dubeyko <slava@dubeyko.com> >>>> Cc: Hin-Tak Leung <htl10@users.sourceforge.net> >>>> Cc: Sougata Santra <sougata@tuxera.com> >>>> Reported-by: Sasha Levin <sasha.levin@oracle.com> >>>> Signed-off-by: Sergei Antonov <saproj@gmail.com> >>>> --- >>>> fs/hfsplus/bnode.c | 6 ++---- >>>> 1 file changed, 2 insertions(+), 4 deletions(-) >>>> >>>> diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c >>>> index 759708f..5af50fb 100644 >>>> --- a/fs/hfsplus/bnode.c >>>> +++ b/fs/hfsplus/bnode.c >>>> @@ -454,7 +454,6 @@ static struct hfs_bnode *__hfs_bnode_create(struct hfs_btree *tree, u32 cnid) >>>> page_cache_release(page); >>>> goto fail; >>>> } >>>> - page_cache_release(page); >>>> node->page[i] = page; >>>> } >>>> >>>> @@ -566,13 +565,12 @@ node_error: >>>> >>>> void hfs_bnode_free(struct hfs_bnode *node) >>>> { >>>> -#if 0 >>>> int i; >>>> >>>> - for (i = 0; i < node->tree->pages_per_bnode; i++) >>>> + for (i = 0; i < node->tree->pages_per_bnode; i++) { >>>> if (node->page[i]) >>>> page_cache_release(node->page[i]); >>>> -#endif >>>> + } >>>> kfree(node); >>>> } >>>> >>>> -- >>>> 2.3.0 >>>> >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >>>-- >>>Anton Altaparmakov <anton at tuxera.com> (replace at with @) >>>Lead in File System Development, Tuxera Inc., http://www.tuxera.com/ >>>Linux NTFS maintainer >>> >> -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 18 June 2015 at 13:58, Sergei Antonov <saproj@gmail.com> wrote: > On 18 June 2015 at 04:51, Hin-Tak Leung <hintak.leung@gmail.com> wrote: >> Hi Andrew and everybody else, >> >> I looked through the pre-git history and seem to have found the reason of how >> the current code had come to be, and why Sergei's fix seems to involve >> re-enable "#if 0"'ed out code. It is this - I think the code was wrong on entry, >> in that the first !REF_PAGES should be REF_PAGES. Then the "remove old >> debug code" >> commit below did not remove the right stuff. >> >> Looking at the old code, I think REF_PAGES may have started out being >> 1 (i.e. pages are released right after create, then get/put when needed, no >> need to release on bnode_free), >> then the code was modified for efficiency so that pages are not released >> on bnode_create and not put/get at bnode_put/get but release at bnode_free. >> But it was still largely working in the REF_PAGE=1 mode (and with the commented >> out release at bnode_free, which is supposed to be spanned by a !REF_PAGE). >> Then it was flipped over, and seems to work, and things were forgotten. >> >> I think the release at bnode_free was commented out because the person who >> wrote it wasn't sure - I am not sure either, since it seems like there might be >> other/more places that one might need to release the pages than just >> at bnode_free(). >> >> Does this train of thought make sense? > > No. The occurrence of !REF_PAGES was not a typo, it seems to me. Well, > at first it was my thought too. But later I considered what the logic > of the author could have been with REF_PAGES=0 and REF_PAGES=1 and > came to a conclusion that REF_PAGES=1 was a safe approach and > REF_PAGES=0 was an experimental unsafe release-early approach, so > !REF_PAGES was what he really meant. > You are wrong, it seems. I found even earlier versions of the code, with history, at http://sourceforge.net/projects/linux-hfsplus/ . The earlier version of the code (http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.1/) does get_page and put_page at bnode_get and bnode_put, while the later version (http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.2/ ) have those removed, dated 2001 december and 2002 june. The change log says so: Date: Thu Jun 6 09:45:14 2002 +0000 Use buffer cache instead of page cache in bnode.c. Cache inode extents. The change also removed any page_cache_release() . It would seem that the additional REF_PAGES conditional was a fork/rewrite/clean-up which re-uses the earlier development logic, and eventually took over. The release-early code came first, not after. That makes sense, as releasing early and read_cache_page on each bnode_get is the slower, safer but inefficient approach. It would be absurd to go from caching pages, then to 'experiment' to see if release-early + later read_cache_page on each bnode_get 'improves'. >> >> Hin-Tak >> >> <commit> >> commit a5e3985fa014029eb6795664c704953720cc7f7d >> Author: Roman Zippel <zippel@linux-m68k.org> >> Date: Tue Sep 6 15:18:47 2005 -0700 >> >> [PATCH] hfs: remove debug code >> >> This removes some old debug code, which is no longer needed. >> >> Signed-off-by: Roman Zippel <zippel@linux-m68k.org> >> Signed-off-by: Andrew Morton <akpm@osdl.org> >> Signed-off-by: Linus Torvalds <torvalds@osdl.org> >> >> diff --git a/fs/hfs/bnode.c b/fs/hfs/bnode.c >> index a096c5a..3d5cdc6 100644 >> --- a/fs/hfs/bnode.c >> +++ b/fs/hfs/bnode.c >> @@ -13,8 +13,6 @@ >> >> #include "btree.h" >> >> -#define REF_PAGES 0 >> - >> void hfs_bnode_read(struct hfs_bnode *node, void *buf, >> int off, int len) >> { >> @@ -289,9 +287,7 @@ static struct hfs_bnode *__hfs_bnode_create(struct >> hfs_btree *tree, u32 cnid) >> page_cache_release(page); >> goto fail; >> } >> -#if !REF_PAGES >> page_cache_release(page); >> -#endif >> node->page[i] = page; >> } >> >> @@ -449,13 +445,6 @@ void hfs_bnode_get(struct hfs_bnode *node) >> { >> if (node) { >> atomic_inc(&node->refcnt); >> -#if REF_PAGES >> - { >> - int i; >> - for (i = 0; i < node->tree->pages_per_bnode; i++) >> - get_page(node->page[i]); >> - } >> -#endif >> dprint(DBG_BNODE_REFS, "get_node(%d:%d): %d\n", >> node->tree->cnid, node->this, atomic_read(&node->refcnt)); >> } >> @@ -472,20 +461,12 @@ void hfs_bnode_put(struct hfs_bnode *node) >> node->tree->cnid, node->this, atomic_read(&node->refcnt)); >> if (!atomic_read(&node->refcnt)) >> BUG(); >> - if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) { >> -#if REF_PAGES >> - for (i = 0; i < tree->pages_per_bnode; i++) >> - put_page(node->page[i]); >> -#endif >> + if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) >> return; >> - } >> for (i = 0; i < tree->pages_per_bnode; i++) { >> if (!node->page[i]) >> continue; >> mark_page_accessed(node->page[i]); >> -#if REF_PAGES >> - put_page(node->page[i]); >> -#endif >> } >> >> if (test_bit(HFS_BNODE_DELETED, &node->flags)) { >> diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c >> index 8868d3b..b85abc6 100644 >> --- a/fs/hfsplus/bnode.c >> +++ b/fs/hfsplus/bnode.c >> @@ -18,8 +18,6 @@ >> #include "hfsplus_fs.h" >> #include "hfsplus_raw.h" >> >> -#define REF_PAGES 0 >> - >> /* Copy a specified range of bytes from the raw data of a node */ >> void hfs_bnode_read(struct hfs_bnode *node, void *buf, int off, int len) >> { >> @@ -450,9 +448,7 @@ static struct hfs_bnode *__hfs_bnode_create(struct >> hfs_btree *tree, u32 cnid) >> page_cache_release(page); >> goto fail; >> } >> -#if !REF_PAGES >> page_cache_release(page); >> -#endif >> node->page[i] = page; >> } >> >> @@ -612,13 +608,6 @@ void hfs_bnode_get(struct hfs_bnode *node) >> { >> if (node) { >> atomic_inc(&node->refcnt); >> -#if REF_PAGES >> - { >> - int i; >> - for (i = 0; i < node->tree->pages_per_bnode; i++) >> - get_page(node->page[i]); >> - } >> -#endif >> dprint(DBG_BNODE_REFS, "get_node(%d:%d): %d\n", >> node->tree->cnid, node->this, atomic_read(&node->refcnt)); >> } >> @@ -635,20 +624,12 @@ void hfs_bnode_put(struct hfs_bnode *node) >> node->tree->cnid, node->this, atomic_read(&node->refcnt)); >> if (!atomic_read(&node->refcnt)) >> BUG(); >> - if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) { >> -#if REF_PAGES >> - for (i = 0; i < tree->pages_per_bnode; i++) >> - put_page(node->page[i]); >> -#endif >> + if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) >> return; >> - } >> for (i = 0; i < tree->pages_per_bnode; i++) { >> if (!node->page[i]) >> continue; >> mark_page_accessed(node->page[i]); >> -#if REF_PAGES >> - put_page(node->page[i]); >> -#endif >> } >> >> if (test_bit(HFS_BNODE_DELETED, &node->flags)) { >> </commit> >> >> On 18 June 2015 at 00:41, Hin-Tak Leung <htl10@users.sourceforge.net> wrote: >>> ------------------------------ >>> On Tue, Jun 9, 2015 7:06 PM BST Anton Altaparmakov wrote: >>> >>>>Hi Andrew, >>>> >>>>Can you please take this patch up and get it merged into mainline? Despite Vyacheslav's lamentations this patch is obviously correct. __hfs_bnode_create() is called in two places from the driver and the pages it returned are kmap()-ed and used (both read and write) so it is quite obvious they cannot be page_cache_release()-d until after they are kunmap()-ed instead of before they are even kmap()-ed as happens without this patch and this patch fixes exactly this by moving the page_cache_release() calls to after the pages are kunmap()-ed. >>>> >>>>Feel free to add: >>>> >>>>Reviewed-by: Anton Altaparmakov <anton@tuxera.com> >>>> >>>>Thanks a lot in advance! >>>> >>>>Best regards, >>>> >>>> Anton >>> >>> I went around and looked at the code and Anton is right - __hfs_bnode_create() is called from hfs_bnode_create() and bfs_bnode_find(), and the pages are played with afterwards a bit further, so page_cache_release() at where the current location is, is wrong. >>> >>> Since I have a way of seeing a likely-related problem, I'll try testing this change in the next few days and possibly adding a Signed-off at the point. >>> >>> BTW, there is a similiar problem with the hfs (no plus) driver also; the "if 0" part is commented out with // c++ style comments instead. >>> I'm looking over the pre-git history to see how that comes about. >>> >>> Hin-Tak >>> >>>>> On 7 Jun 2015, at 03:42, Sergei Antonov <saproj@gmail.com> wrote: >>>>> >>>>> Fix this bugreport by Sasha Levin: >>>>> http://lkml.org/lkml/2015/2/20/85 ("use after free") >>>>> Make sure mapped pages are available for the entire lifetime of hfs_bnode. >>>>> >>>>> Cc: Al Viro <viro@zeniv.linux.org.uk> >>>>> Cc: Christoph Hellwig <hch@infradead.org> >>>>> Cc: Andrew Morton <akpm@linux-foundation.org> >>>>> Cc: Vyacheslav Dubeyko <slava@dubeyko.com> >>>>> Cc: Hin-Tak Leung <htl10@users.sourceforge.net> >>>>> Cc: Sougata Santra <sougata@tuxera.com> >>>>> Reported-by: Sasha Levin <sasha.levin@oracle.com> >>>>> Signed-off-by: Sergei Antonov <saproj@gmail.com> >>>>> --- >>>>> fs/hfsplus/bnode.c | 6 ++---- >>>>> 1 file changed, 2 insertions(+), 4 deletions(-) >>>>> >>>>> diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c >>>>> index 759708f..5af50fb 100644 >>>>> --- a/fs/hfsplus/bnode.c >>>>> +++ b/fs/hfsplus/bnode.c >>>>> @@ -454,7 +454,6 @@ static struct hfs_bnode *__hfs_bnode_create(struct hfs_btree *tree, u32 cnid) >>>>> page_cache_release(page); >>>>> goto fail; >>>>> } >>>>> - page_cache_release(page); >>>>> node->page[i] = page; >>>>> } >>>>> >>>>> @@ -566,13 +565,12 @@ node_error: >>>>> >>>>> void hfs_bnode_free(struct hfs_bnode *node) >>>>> { >>>>> -#if 0 >>>>> int i; >>>>> >>>>> - for (i = 0; i < node->tree->pages_per_bnode; i++) >>>>> + for (i = 0; i < node->tree->pages_per_bnode; i++) { >>>>> if (node->page[i]) >>>>> page_cache_release(node->page[i]); >>>>> -#endif >>>>> + } >>>>> kfree(node); >>>>> } >>>>> >>>>> -- >>>>> 2.3.0 >>>>> >>>>> -- >>>>> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in >>>>> the body of a message to majordomo@vger.kernel.org >>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> >>>>-- >>>>Anton Altaparmakov <anton at tuxera.com> (replace at with @) >>>>Lead in File System Development, Tuxera Inc., http://www.tuxera.com/ >>>>Linux NTFS maintainer >>>> >>> -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 18 June 2015 at 17:37, Hin-Tak Leung <hintak.leung@gmail.com> wrote: > On 18 June 2015 at 13:58, Sergei Antonov <saproj@gmail.com> wrote: >> On 18 June 2015 at 04:51, Hin-Tak Leung <hintak.leung@gmail.com> wrote: >>> Hi Andrew and everybody else, >>> >>> I looked through the pre-git history and seem to have found the reason of how >>> the current code had come to be, and why Sergei's fix seems to involve >>> re-enable "#if 0"'ed out code. It is this - I think the code was wrong on entry, >>> in that the first !REF_PAGES should be REF_PAGES. Then the "remove old >>> debug code" >>> commit below did not remove the right stuff. >>> >>> Looking at the old code, I think REF_PAGES may have started out being >>> 1 (i.e. pages are released right after create, then get/put when needed, no >>> need to release on bnode_free), >>> then the code was modified for efficiency so that pages are not released >>> on bnode_create and not put/get at bnode_put/get but release at bnode_free. >>> But it was still largely working in the REF_PAGE=1 mode (and with the commented >>> out release at bnode_free, which is supposed to be spanned by a !REF_PAGE). >>> Then it was flipped over, and seems to work, and things were forgotten. >>> >>> I think the release at bnode_free was commented out because the person who >>> wrote it wasn't sure - I am not sure either, since it seems like there might be >>> other/more places that one might need to release the pages than just >>> at bnode_free(). >>> >>> Does this train of thought make sense? >> >> No. The occurrence of !REF_PAGES was not a typo, it seems to me. Well, >> at first it was my thought too. But later I considered what the logic >> of the author could have been with REF_PAGES=0 and REF_PAGES=1 and >> came to a conclusion that REF_PAGES=1 was a safe approach and >> REF_PAGES=0 was an experimental unsafe release-early approach, so >> !REF_PAGES was what he really meant. >> > > You are wrong, it seems. I found even earlier versions of the code, > with history, at > http://sourceforge.net/projects/linux-hfsplus/ . > The earlier version of the code > (http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.1/) > does get_page and put_page at bnode_get and bnode_put, while the later version > (http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.2/ > ) > have those removed, dated 2001 december and 2002 june. Interesting. I only looked at historical linux git. > The change log says so: > > Date: Thu Jun 6 09:45:14 2002 +0000 > > Use buffer cache instead of page cache in bnode.c. Cache inode extents. > > The change also removed any page_cache_release() . It would seem that > the additional REF_PAGES > conditional was a fork/rewrite/clean-up which re-uses the earlier > development logic, and eventually took over. > > The release-early code came first, not after. That makes sense, as > releasing early We may be talking about different things. I did not write what code came earlier and what after. Anyway, this becomes to difficult to follow. My point was that in the variant without refs, calling page_cache_release() right after to read_cache_page() was exactly what the author wanted (although it is a bug). Again, we are probably arguing about different things. And I'm a little less zealous than you to do this code archeology. > and read_cache_page on each bnode_get is the slower, safer but > inefficient approach. > It would be absurd to go from caching pages, then to 'experiment' to > see if release-early + later read_cache_page > on each bnode_get 'improves'. > >>> >>> Hin-Tak >>> >>> <commit> >>> commit a5e3985fa014029eb6795664c704953720cc7f7d >>> Author: Roman Zippel <zippel@linux-m68k.org> >>> Date: Tue Sep 6 15:18:47 2005 -0700 >>> >>> [PATCH] hfs: remove debug code >>> >>> This removes some old debug code, which is no longer needed. >>> >>> Signed-off-by: Roman Zippel <zippel@linux-m68k.org> >>> Signed-off-by: Andrew Morton <akpm@osdl.org> >>> Signed-off-by: Linus Torvalds <torvalds@osdl.org> >>> >>> diff --git a/fs/hfs/bnode.c b/fs/hfs/bnode.c >>> index a096c5a..3d5cdc6 100644 >>> --- a/fs/hfs/bnode.c >>> +++ b/fs/hfs/bnode.c >>> @@ -13,8 +13,6 @@ >>> >>> #include "btree.h" >>> >>> -#define REF_PAGES 0 >>> - >>> void hfs_bnode_read(struct hfs_bnode *node, void *buf, >>> int off, int len) >>> { >>> @@ -289,9 +287,7 @@ static struct hfs_bnode *__hfs_bnode_create(struct >>> hfs_btree *tree, u32 cnid) >>> page_cache_release(page); >>> goto fail; >>> } >>> -#if !REF_PAGES >>> page_cache_release(page); >>> -#endif >>> node->page[i] = page; >>> } >>> >>> @@ -449,13 +445,6 @@ void hfs_bnode_get(struct hfs_bnode *node) >>> { >>> if (node) { >>> atomic_inc(&node->refcnt); >>> -#if REF_PAGES >>> - { >>> - int i; >>> - for (i = 0; i < node->tree->pages_per_bnode; i++) >>> - get_page(node->page[i]); >>> - } >>> -#endif >>> dprint(DBG_BNODE_REFS, "get_node(%d:%d): %d\n", >>> node->tree->cnid, node->this, atomic_read(&node->refcnt)); >>> } >>> @@ -472,20 +461,12 @@ void hfs_bnode_put(struct hfs_bnode *node) >>> node->tree->cnid, node->this, atomic_read(&node->refcnt)); >>> if (!atomic_read(&node->refcnt)) >>> BUG(); >>> - if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) { >>> -#if REF_PAGES >>> - for (i = 0; i < tree->pages_per_bnode; i++) >>> - put_page(node->page[i]); >>> -#endif >>> + if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) >>> return; >>> - } >>> for (i = 0; i < tree->pages_per_bnode; i++) { >>> if (!node->page[i]) >>> continue; >>> mark_page_accessed(node->page[i]); >>> -#if REF_PAGES >>> - put_page(node->page[i]); >>> -#endif >>> } >>> >>> if (test_bit(HFS_BNODE_DELETED, &node->flags)) { >>> diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c >>> index 8868d3b..b85abc6 100644 >>> --- a/fs/hfsplus/bnode.c >>> +++ b/fs/hfsplus/bnode.c >>> @@ -18,8 +18,6 @@ >>> #include "hfsplus_fs.h" >>> #include "hfsplus_raw.h" >>> >>> -#define REF_PAGES 0 >>> - >>> /* Copy a specified range of bytes from the raw data of a node */ >>> void hfs_bnode_read(struct hfs_bnode *node, void *buf, int off, int len) >>> { >>> @@ -450,9 +448,7 @@ static struct hfs_bnode *__hfs_bnode_create(struct >>> hfs_btree *tree, u32 cnid) >>> page_cache_release(page); >>> goto fail; >>> } >>> -#if !REF_PAGES >>> page_cache_release(page); >>> -#endif >>> node->page[i] = page; >>> } >>> >>> @@ -612,13 +608,6 @@ void hfs_bnode_get(struct hfs_bnode *node) >>> { >>> if (node) { >>> atomic_inc(&node->refcnt); >>> -#if REF_PAGES >>> - { >>> - int i; >>> - for (i = 0; i < node->tree->pages_per_bnode; i++) >>> - get_page(node->page[i]); >>> - } >>> -#endif >>> dprint(DBG_BNODE_REFS, "get_node(%d:%d): %d\n", >>> node->tree->cnid, node->this, atomic_read(&node->refcnt)); >>> } >>> @@ -635,20 +624,12 @@ void hfs_bnode_put(struct hfs_bnode *node) >>> node->tree->cnid, node->this, atomic_read(&node->refcnt)); >>> if (!atomic_read(&node->refcnt)) >>> BUG(); >>> - if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) { >>> -#if REF_PAGES >>> - for (i = 0; i < tree->pages_per_bnode; i++) >>> - put_page(node->page[i]); >>> -#endif >>> + if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) >>> return; >>> - } >>> for (i = 0; i < tree->pages_per_bnode; i++) { >>> if (!node->page[i]) >>> continue; >>> mark_page_accessed(node->page[i]); >>> -#if REF_PAGES >>> - put_page(node->page[i]); >>> -#endif >>> } >>> >>> if (test_bit(HFS_BNODE_DELETED, &node->flags)) { >>> </commit> >>> >>> On 18 June 2015 at 00:41, Hin-Tak Leung <htl10@users.sourceforge.net> wrote: >>>> ------------------------------ >>>> On Tue, Jun 9, 2015 7:06 PM BST Anton Altaparmakov wrote: >>>> >>>>>Hi Andrew, >>>>> >>>>>Can you please take this patch up and get it merged into mainline? Despite Vyacheslav's lamentations this patch is obviously correct. __hfs_bnode_create() is called in two places from the driver and the pages it returned are kmap()-ed and used (both read and write) so it is quite obvious they cannot be page_cache_release()-d until after they are kunmap()-ed instead of before they are even kmap()-ed as happens without this patch and this patch fixes exactly this by moving the page_cache_release() calls to after the pages are kunmap()-ed. >>>>> >>>>>Feel free to add: >>>>> >>>>>Reviewed-by: Anton Altaparmakov <anton@tuxera.com> >>>>> >>>>>Thanks a lot in advance! >>>>> >>>>>Best regards, >>>>> >>>>> Anton >>>> >>>> I went around and looked at the code and Anton is right - __hfs_bnode_create() is called from hfs_bnode_create() and bfs_bnode_find(), and the pages are played with afterwards a bit further, so page_cache_release() at where the current location is, is wrong. >>>> >>>> Since I have a way of seeing a likely-related problem, I'll try testing this change in the next few days and possibly adding a Signed-off at the point. >>>> >>>> BTW, there is a similiar problem with the hfs (no plus) driver also; the "if 0" part is commented out with // c++ style comments instead. >>>> I'm looking over the pre-git history to see how that comes about. >>>> >>>> Hin-Tak >>>> >>>>>> On 7 Jun 2015, at 03:42, Sergei Antonov <saproj@gmail.com> wrote: >>>>>> >>>>>> Fix this bugreport by Sasha Levin: >>>>>> http://lkml.org/lkml/2015/2/20/85 ("use after free") >>>>>> Make sure mapped pages are available for the entire lifetime of hfs_bnode. >>>>>> >>>>>> Cc: Al Viro <viro@zeniv.linux.org.uk> >>>>>> Cc: Christoph Hellwig <hch@infradead.org> >>>>>> Cc: Andrew Morton <akpm@linux-foundation.org> >>>>>> Cc: Vyacheslav Dubeyko <slava@dubeyko.com> >>>>>> Cc: Hin-Tak Leung <htl10@users.sourceforge.net> >>>>>> Cc: Sougata Santra <sougata@tuxera.com> >>>>>> Reported-by: Sasha Levin <sasha.levin@oracle.com> >>>>>> Signed-off-by: Sergei Antonov <saproj@gmail.com> >>>>>> --- >>>>>> fs/hfsplus/bnode.c | 6 ++---- >>>>>> 1 file changed, 2 insertions(+), 4 deletions(-) >>>>>> >>>>>> diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c >>>>>> index 759708f..5af50fb 100644 >>>>>> --- a/fs/hfsplus/bnode.c >>>>>> +++ b/fs/hfsplus/bnode.c >>>>>> @@ -454,7 +454,6 @@ static struct hfs_bnode *__hfs_bnode_create(struct hfs_btree *tree, u32 cnid) >>>>>> page_cache_release(page); >>>>>> goto fail; >>>>>> } >>>>>> - page_cache_release(page); >>>>>> node->page[i] = page; >>>>>> } >>>>>> >>>>>> @@ -566,13 +565,12 @@ node_error: >>>>>> >>>>>> void hfs_bnode_free(struct hfs_bnode *node) >>>>>> { >>>>>> -#if 0 >>>>>> int i; >>>>>> >>>>>> - for (i = 0; i < node->tree->pages_per_bnode; i++) >>>>>> + for (i = 0; i < node->tree->pages_per_bnode; i++) { >>>>>> if (node->page[i]) >>>>>> page_cache_release(node->page[i]); >>>>>> -#endif >>>>>> + } >>>>>> kfree(node); >>>>>> } >>>>>> >>>>>> -- >>>>>> 2.3.0 >>>>>> >>>>>> -- >>>>>> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in >>>>>> the body of a message to majordomo@vger.kernel.org >>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>> >>>>>-- >>>>>Anton Altaparmakov <anton at tuxera.com> (replace at with @) >>>>>Lead in File System Development, Tuxera Inc., http://www.tuxera.com/ >>>>>Linux NTFS maintainer >>>>> >>>> -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 18 June 2015 at 17:19, Sergei Antonov <saproj@gmail.com> wrote: > On 18 June 2015 at 17:37, Hin-Tak Leung <hintak.leung@gmail.com> wrote: >> On 18 June 2015 at 13:58, Sergei Antonov <saproj@gmail.com> wrote: >>> On 18 June 2015 at 04:51, Hin-Tak Leung <hintak.leung@gmail.com> wrote: >>>> Hi Andrew and everybody else, >>>> >>>> I looked through the pre-git history and seem to have found the reason of how >>>> the current code had come to be, and why Sergei's fix seems to involve >>>> re-enable "#if 0"'ed out code. It is this - I think the code was wrong on entry, >>>> in that the first !REF_PAGES should be REF_PAGES. Then the "remove old >>>> debug code" >>>> commit below did not remove the right stuff. >>>> >>>> Looking at the old code, I think REF_PAGES may have started out being >>>> 1 (i.e. pages are released right after create, then get/put when needed, no >>>> need to release on bnode_free), >>>> then the code was modified for efficiency so that pages are not released >>>> on bnode_create and not put/get at bnode_put/get but release at bnode_free. >>>> But it was still largely working in the REF_PAGE=1 mode (and with the commented >>>> out release at bnode_free, which is supposed to be spanned by a !REF_PAGE). >>>> Then it was flipped over, and seems to work, and things were forgotten. >>>> >>>> I think the release at bnode_free was commented out because the person who >>>> wrote it wasn't sure - I am not sure either, since it seems like there might be >>>> other/more places that one might need to release the pages than just >>>> at bnode_free(). >>>> >>>> Does this train of thought make sense? >>> >>> No. The occurrence of !REF_PAGES was not a typo, it seems to me. Well, >>> at first it was my thought too. But later I considered what the logic >>> of the author could have been with REF_PAGES=0 and REF_PAGES=1 and >>> came to a conclusion that REF_PAGES=1 was a safe approach and >>> REF_PAGES=0 was an experimental unsafe release-early approach, so >>> !REF_PAGES was what he really meant. >>> >> >> You are wrong, it seems. I found even earlier versions of the code, >> with history, at >> http://sourceforge.net/projects/linux-hfsplus/ . >> The earlier version of the code >> (http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.1/) >> does get_page and put_page at bnode_get and bnode_put, while the later version >> (http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.2/ >> ) >> have those removed, dated 2001 december and 2002 june. > > Interesting. I only looked at historical linux git. > >> The change log says so: >> >> Date: Thu Jun 6 09:45:14 2002 +0000 >> >> Use buffer cache instead of page cache in bnode.c. Cache inode extents. >> >> The change also removed any page_cache_release() . It would seem that >> the additional REF_PAGES >> conditional was a fork/rewrite/clean-up which re-uses the earlier >> development logic, and eventually took over. >> >> The release-early code came first, not after. That makes sense, as >> releasing early > > We may be talking about different things. I did not write what code > came earlier and what after. Anyway, this becomes to difficult to > follow. My point was that in the variant without refs, calling > page_cache_release() right after to read_cache_page() was exactly what > the author wanted (although it is a bug). > > Again, we are probably arguing about different things. And I'm a > little less zealous than you to do this code archeology. > I would really prefer that you don't use personal descriptions such as "zealous" as well as making the kind of comments such as "The result of the test (which I'm sure will show the bug is gone) will be by far more valuable contribution than your other messages in the thread.". Those are not helpful. In my first reply I have already stated my reasoning: your fix consists mainly of removing an "if 0" and re-enabling seemingly dead code. So the dead code was written for a purpose which is now lost and forgotten. And here is that purpose - early development of the code was done with getting and putting pages within each routine, for safeness, and migrating towards caching pages for a longer life cycle in 2001-2002. When the code was cleaned up and re-worked before putting into the kernel in 2005, this development process was repeated but with a REF_PAGES conditional and some additional code which was // commented out. The migration to caching pages would have been more or less correct, if REF_PAGES was flipped and the // commented out code was enabled. Except there was a mistake in one of the conditionals, and also the commented out code never got enabled. The commented out code was where a !REF_PAGES supposed to be. the rest should all be all plain "if REF_PAGES". The singular "if !REF_PAGES" was a mistake that has propagated. >> and read_cache_page on each bnode_get is the slower, safer but >> inefficient approach. >> It would be absurd to go from caching pages, then to 'experiment' to >> see if release-early + later read_cache_page >> on each bnode_get 'improves'. >> >>>> >>>> Hin-Tak >>>> >>>> <commit> >>>> commit a5e3985fa014029eb6795664c704953720cc7f7d >>>> Author: Roman Zippel <zippel@linux-m68k.org> >>>> Date: Tue Sep 6 15:18:47 2005 -0700 >>>> >>>> [PATCH] hfs: remove debug code >>>> >>>> This removes some old debug code, which is no longer needed. >>>> >>>> Signed-off-by: Roman Zippel <zippel@linux-m68k.org> >>>> Signed-off-by: Andrew Morton <akpm@osdl.org> >>>> Signed-off-by: Linus Torvalds <torvalds@osdl.org> >>>> >>>> diff --git a/fs/hfs/bnode.c b/fs/hfs/bnode.c >>>> index a096c5a..3d5cdc6 100644 >>>> --- a/fs/hfs/bnode.c >>>> +++ b/fs/hfs/bnode.c >>>> @@ -13,8 +13,6 @@ >>>> >>>> #include "btree.h" >>>> >>>> -#define REF_PAGES 0 >>>> - >>>> void hfs_bnode_read(struct hfs_bnode *node, void *buf, >>>> int off, int len) >>>> { >>>> @@ -289,9 +287,7 @@ static struct hfs_bnode *__hfs_bnode_create(struct >>>> hfs_btree *tree, u32 cnid) >>>> page_cache_release(page); >>>> goto fail; >>>> } >>>> -#if !REF_PAGES >>>> page_cache_release(page); >>>> -#endif >>>> node->page[i] = page; >>>> } >>>> >>>> @@ -449,13 +445,6 @@ void hfs_bnode_get(struct hfs_bnode *node) >>>> { >>>> if (node) { >>>> atomic_inc(&node->refcnt); >>>> -#if REF_PAGES >>>> - { >>>> - int i; >>>> - for (i = 0; i < node->tree->pages_per_bnode; i++) >>>> - get_page(node->page[i]); >>>> - } >>>> -#endif >>>> dprint(DBG_BNODE_REFS, "get_node(%d:%d): %d\n", >>>> node->tree->cnid, node->this, atomic_read(&node->refcnt)); >>>> } >>>> @@ -472,20 +461,12 @@ void hfs_bnode_put(struct hfs_bnode *node) >>>> node->tree->cnid, node->this, atomic_read(&node->refcnt)); >>>> if (!atomic_read(&node->refcnt)) >>>> BUG(); >>>> - if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) { >>>> -#if REF_PAGES >>>> - for (i = 0; i < tree->pages_per_bnode; i++) >>>> - put_page(node->page[i]); >>>> -#endif >>>> + if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) >>>> return; >>>> - } >>>> for (i = 0; i < tree->pages_per_bnode; i++) { >>>> if (!node->page[i]) >>>> continue; >>>> mark_page_accessed(node->page[i]); >>>> -#if REF_PAGES >>>> - put_page(node->page[i]); >>>> -#endif >>>> } >>>> >>>> if (test_bit(HFS_BNODE_DELETED, &node->flags)) { >>>> diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c >>>> index 8868d3b..b85abc6 100644 >>>> --- a/fs/hfsplus/bnode.c >>>> +++ b/fs/hfsplus/bnode.c >>>> @@ -18,8 +18,6 @@ >>>> #include "hfsplus_fs.h" >>>> #include "hfsplus_raw.h" >>>> >>>> -#define REF_PAGES 0 >>>> - >>>> /* Copy a specified range of bytes from the raw data of a node */ >>>> void hfs_bnode_read(struct hfs_bnode *node, void *buf, int off, int len) >>>> { >>>> @@ -450,9 +448,7 @@ static struct hfs_bnode *__hfs_bnode_create(struct >>>> hfs_btree *tree, u32 cnid) >>>> page_cache_release(page); >>>> goto fail; >>>> } >>>> -#if !REF_PAGES >>>> page_cache_release(page); >>>> -#endif >>>> node->page[i] = page; >>>> } >>>> >>>> @@ -612,13 +608,6 @@ void hfs_bnode_get(struct hfs_bnode *node) >>>> { >>>> if (node) { >>>> atomic_inc(&node->refcnt); >>>> -#if REF_PAGES >>>> - { >>>> - int i; >>>> - for (i = 0; i < node->tree->pages_per_bnode; i++) >>>> - get_page(node->page[i]); >>>> - } >>>> -#endif >>>> dprint(DBG_BNODE_REFS, "get_node(%d:%d): %d\n", >>>> node->tree->cnid, node->this, atomic_read(&node->refcnt)); >>>> } >>>> @@ -635,20 +624,12 @@ void hfs_bnode_put(struct hfs_bnode *node) >>>> node->tree->cnid, node->this, atomic_read(&node->refcnt)); >>>> if (!atomic_read(&node->refcnt)) >>>> BUG(); >>>> - if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) { >>>> -#if REF_PAGES >>>> - for (i = 0; i < tree->pages_per_bnode; i++) >>>> - put_page(node->page[i]); >>>> -#endif >>>> + if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) >>>> return; >>>> - } >>>> for (i = 0; i < tree->pages_per_bnode; i++) { >>>> if (!node->page[i]) >>>> continue; >>>> mark_page_accessed(node->page[i]); >>>> -#if REF_PAGES >>>> - put_page(node->page[i]); >>>> -#endif >>>> } >>>> >>>> if (test_bit(HFS_BNODE_DELETED, &node->flags)) { >>>> </commit> >>>> >>>> On 18 June 2015 at 00:41, Hin-Tak Leung <htl10@users.sourceforge.net> wrote: >>>>> ------------------------------ >>>>> On Tue, Jun 9, 2015 7:06 PM BST Anton Altaparmakov wrote: >>>>> >>>>>>Hi Andrew, >>>>>> >>>>>>Can you please take this patch up and get it merged into mainline? Despite Vyacheslav's lamentations this patch is obviously correct. __hfs_bnode_create() is called in two places from the driver and the pages it returned are kmap()-ed and used (both read and write) so it is quite obvious they cannot be page_cache_release()-d until after they are kunmap()-ed instead of before they are even kmap()-ed as happens without this patch and this patch fixes exactly this by moving the page_cache_release() calls to after the pages are kunmap()-ed. >>>>>> >>>>>>Feel free to add: >>>>>> >>>>>>Reviewed-by: Anton Altaparmakov <anton@tuxera.com> >>>>>> >>>>>>Thanks a lot in advance! >>>>>> >>>>>>Best regards, >>>>>> >>>>>> Anton >>>>> >>>>> I went around and looked at the code and Anton is right - __hfs_bnode_create() is called from hfs_bnode_create() and bfs_bnode_find(), and the pages are played with afterwards a bit further, so page_cache_release() at where the current location is, is wrong. >>>>> >>>>> Since I have a way of seeing a likely-related problem, I'll try testing this change in the next few days and possibly adding a Signed-off at the point. >>>>> >>>>> BTW, there is a similiar problem with the hfs (no plus) driver also; the "if 0" part is commented out with // c++ style comments instead. >>>>> I'm looking over the pre-git history to see how that comes about. >>>>> >>>>> Hin-Tak >>>>> >>>>>>> On 7 Jun 2015, at 03:42, Sergei Antonov <saproj@gmail.com> wrote: >>>>>>> >>>>>>> Fix this bugreport by Sasha Levin: >>>>>>> http://lkml.org/lkml/2015/2/20/85 ("use after free") >>>>>>> Make sure mapped pages are available for the entire lifetime of hfs_bnode. >>>>>>> >>>>>>> Cc: Al Viro <viro@zeniv.linux.org.uk> >>>>>>> Cc: Christoph Hellwig <hch@infradead.org> >>>>>>> Cc: Andrew Morton <akpm@linux-foundation.org> >>>>>>> Cc: Vyacheslav Dubeyko <slava@dubeyko.com> >>>>>>> Cc: Hin-Tak Leung <htl10@users.sourceforge.net> >>>>>>> Cc: Sougata Santra <sougata@tuxera.com> >>>>>>> Reported-by: Sasha Levin <sasha.levin@oracle.com> >>>>>>> Signed-off-by: Sergei Antonov <saproj@gmail.com> >>>>>>> --- >>>>>>> fs/hfsplus/bnode.c | 6 ++---- >>>>>>> 1 file changed, 2 insertions(+), 4 deletions(-) >>>>>>> >>>>>>> diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c >>>>>>> index 759708f..5af50fb 100644 >>>>>>> --- a/fs/hfsplus/bnode.c >>>>>>> +++ b/fs/hfsplus/bnode.c >>>>>>> @@ -454,7 +454,6 @@ static struct hfs_bnode *__hfs_bnode_create(struct hfs_btree *tree, u32 cnid) >>>>>>> page_cache_release(page); >>>>>>> goto fail; >>>>>>> } >>>>>>> - page_cache_release(page); >>>>>>> node->page[i] = page; >>>>>>> } >>>>>>> >>>>>>> @@ -566,13 +565,12 @@ node_error: >>>>>>> >>>>>>> void hfs_bnode_free(struct hfs_bnode *node) >>>>>>> { >>>>>>> -#if 0 >>>>>>> int i; >>>>>>> >>>>>>> - for (i = 0; i < node->tree->pages_per_bnode; i++) >>>>>>> + for (i = 0; i < node->tree->pages_per_bnode; i++) { >>>>>>> if (node->page[i]) >>>>>>> page_cache_release(node->page[i]); >>>>>>> -#endif >>>>>>> + } >>>>>>> kfree(node); >>>>>>> } >>>>>>> >>>>>>> -- >>>>>>> 2.3.0 >>>>>>> >>>>>>> -- >>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in >>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>> >>>>>>-- >>>>>>Anton Altaparmakov <anton at tuxera.com> (replace at with @) >>>>>>Lead in File System Development, Tuxera Inc., http://www.tuxera.com/ >>>>>>Linux NTFS maintainer >>>>>> >>>>> -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 18 June 2015 at 19:16, Hin-Tak Leung <hintak.leung@gmail.com> wrote: > On 18 June 2015 at 17:19, Sergei Antonov <saproj@gmail.com> wrote: >> On 18 June 2015 at 17:37, Hin-Tak Leung <hintak.leung@gmail.com> wrote: >>> On 18 June 2015 at 13:58, Sergei Antonov <saproj@gmail.com> wrote: >>>> On 18 June 2015 at 04:51, Hin-Tak Leung <hintak.leung@gmail.com> wrote: >>>>> Hi Andrew and everybody else, >>>>> >>>>> I looked through the pre-git history and seem to have found the reason of how >>>>> the current code had come to be, and why Sergei's fix seems to involve >>>>> re-enable "#if 0"'ed out code. It is this - I think the code was wrong on entry, >>>>> in that the first !REF_PAGES should be REF_PAGES. Then the "remove old >>>>> debug code" >>>>> commit below did not remove the right stuff. >>>>> >>>>> Looking at the old code, I think REF_PAGES may have started out being >>>>> 1 (i.e. pages are released right after create, then get/put when needed, no >>>>> need to release on bnode_free), >>>>> then the code was modified for efficiency so that pages are not released >>>>> on bnode_create and not put/get at bnode_put/get but release at bnode_free. >>>>> But it was still largely working in the REF_PAGE=1 mode (and with the commented >>>>> out release at bnode_free, which is supposed to be spanned by a !REF_PAGE). >>>>> Then it was flipped over, and seems to work, and things were forgotten. >>>>> >>>>> I think the release at bnode_free was commented out because the person who >>>>> wrote it wasn't sure - I am not sure either, since it seems like there might be >>>>> other/more places that one might need to release the pages than just >>>>> at bnode_free(). >>>>> >>>>> Does this train of thought make sense? >>>> >>>> No. The occurrence of !REF_PAGES was not a typo, it seems to me. Well, >>>> at first it was my thought too. But later I considered what the logic >>>> of the author could have been with REF_PAGES=0 and REF_PAGES=1 and >>>> came to a conclusion that REF_PAGES=1 was a safe approach and >>>> REF_PAGES=0 was an experimental unsafe release-early approach, so >>>> !REF_PAGES was what he really meant. >>>> >>> >>> You are wrong, it seems. I found even earlier versions of the code, >>> with history, at >>> http://sourceforge.net/projects/linux-hfsplus/ . >>> The earlier version of the code >>> (http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.1/) >>> does get_page and put_page at bnode_get and bnode_put, while the later version >>> (http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.2/ >>> ) >>> have those removed, dated 2001 december and 2002 june. >> >> Interesting. I only looked at historical linux git. >> >>> The change log says so: >>> >>> Date: Thu Jun 6 09:45:14 2002 +0000 >>> >>> Use buffer cache instead of page cache in bnode.c. Cache inode extents. >>> >>> The change also removed any page_cache_release() . It would seem that >>> the additional REF_PAGES >>> conditional was a fork/rewrite/clean-up which re-uses the earlier >>> development logic, and eventually took over. >>> >>> The release-early code came first, not after. That makes sense, as >>> releasing early >> >> We may be talking about different things. I did not write what code >> came earlier and what after. Anyway, this becomes to difficult to >> follow. My point was that in the variant without refs, calling >> page_cache_release() right after to read_cache_page() was exactly what >> the author wanted (although it is a bug). >> >> Again, we are probably arguing about different things. And I'm a >> little less zealous than you to do this code archeology. >> > > I would really prefer that you don't use personal descriptions such as "zealous" > as well as making the kind of comments such as > "The result of the test (which I'm sure will show the bug is gone) will > be by far more valuable contribution than your other messages in the > thread.". Those are not helpful. > > In my first reply I have already stated my reasoning: your fix consists > mainly of removing an "if 0" and re-enabling seemingly dead code. So the > dead code was written for a purpose which is now lost and forgotten. Of course. And it is not hard to guess that purpose quite reliably :). The code used to release pages correctly, i. e. when freeing the bnode structure. > And here is that purpose - early development of the code was done > with getting and putting pages within each routine, for safeness, and migrating > towards caching pages for a longer life cycle in 2001-2002. When the > code was cleaned > up and re-worked before putting into the kernel in 2005, this > development process > was repeated but with a REF_PAGES conditional and some additional code > which was // commented out. > > The migration to caching pages > would have been more or less correct, if REF_PAGES was flipped > and the // commented out code was enabled. Except there was a mistake > in one of the conditionals By mistake you mean !REF_PAGES in __hfs_bnode_create()? But it is not a mistake under the conditions you provided above. With REF_PAGES flipped to 1 and the // piece uncommented you don't want to release in __hfs_bnode_create() because releasing is done in hfs_bnode_free(). >, and also the commented out code never got > enabled. The commented out code was where a !REF_PAGES supposed to be. > the rest should all be all plain "if REF_PAGES". The singular "if !REF_PAGES" > was a mistake that has propagated. > > >>> and read_cache_page on each bnode_get is the slower, safer but >>> inefficient approach. >>> It would be absurd to go from caching pages, then to 'experiment' to >>> see if release-early + later read_cache_page >>> on each bnode_get 'improves'. >>> >>>>> >>>>> Hin-Tak >>>>> >>>>> <commit> >>>>> commit a5e3985fa014029eb6795664c704953720cc7f7d >>>>> Author: Roman Zippel <zippel@linux-m68k.org> >>>>> Date: Tue Sep 6 15:18:47 2005 -0700 >>>>> >>>>> [PATCH] hfs: remove debug code >>>>> >>>>> This removes some old debug code, which is no longer needed. >>>>> >>>>> Signed-off-by: Roman Zippel <zippel@linux-m68k.org> >>>>> Signed-off-by: Andrew Morton <akpm@osdl.org> >>>>> Signed-off-by: Linus Torvalds <torvalds@osdl.org> >>>>> >>>>> diff --git a/fs/hfs/bnode.c b/fs/hfs/bnode.c >>>>> index a096c5a..3d5cdc6 100644 >>>>> --- a/fs/hfs/bnode.c >>>>> +++ b/fs/hfs/bnode.c >>>>> @@ -13,8 +13,6 @@ >>>>> >>>>> #include "btree.h" >>>>> >>>>> -#define REF_PAGES 0 >>>>> - >>>>> void hfs_bnode_read(struct hfs_bnode *node, void *buf, >>>>> int off, int len) >>>>> { >>>>> @@ -289,9 +287,7 @@ static struct hfs_bnode *__hfs_bnode_create(struct >>>>> hfs_btree *tree, u32 cnid) >>>>> page_cache_release(page); >>>>> goto fail; >>>>> } >>>>> -#if !REF_PAGES >>>>> page_cache_release(page); >>>>> -#endif >>>>> node->page[i] = page; >>>>> } >>>>> >>>>> @@ -449,13 +445,6 @@ void hfs_bnode_get(struct hfs_bnode *node) >>>>> { >>>>> if (node) { >>>>> atomic_inc(&node->refcnt); >>>>> -#if REF_PAGES >>>>> - { >>>>> - int i; >>>>> - for (i = 0; i < node->tree->pages_per_bnode; i++) >>>>> - get_page(node->page[i]); >>>>> - } >>>>> -#endif >>>>> dprint(DBG_BNODE_REFS, "get_node(%d:%d): %d\n", >>>>> node->tree->cnid, node->this, atomic_read(&node->refcnt)); >>>>> } >>>>> @@ -472,20 +461,12 @@ void hfs_bnode_put(struct hfs_bnode *node) >>>>> node->tree->cnid, node->this, atomic_read(&node->refcnt)); >>>>> if (!atomic_read(&node->refcnt)) >>>>> BUG(); >>>>> - if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) { >>>>> -#if REF_PAGES >>>>> - for (i = 0; i < tree->pages_per_bnode; i++) >>>>> - put_page(node->page[i]); >>>>> -#endif >>>>> + if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) >>>>> return; >>>>> - } >>>>> for (i = 0; i < tree->pages_per_bnode; i++) { >>>>> if (!node->page[i]) >>>>> continue; >>>>> mark_page_accessed(node->page[i]); >>>>> -#if REF_PAGES >>>>> - put_page(node->page[i]); >>>>> -#endif >>>>> } >>>>> >>>>> if (test_bit(HFS_BNODE_DELETED, &node->flags)) { >>>>> diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c >>>>> index 8868d3b..b85abc6 100644 >>>>> --- a/fs/hfsplus/bnode.c >>>>> +++ b/fs/hfsplus/bnode.c >>>>> @@ -18,8 +18,6 @@ >>>>> #include "hfsplus_fs.h" >>>>> #include "hfsplus_raw.h" >>>>> >>>>> -#define REF_PAGES 0 >>>>> - >>>>> /* Copy a specified range of bytes from the raw data of a node */ >>>>> void hfs_bnode_read(struct hfs_bnode *node, void *buf, int off, int len) >>>>> { >>>>> @@ -450,9 +448,7 @@ static struct hfs_bnode *__hfs_bnode_create(struct >>>>> hfs_btree *tree, u32 cnid) >>>>> page_cache_release(page); >>>>> goto fail; >>>>> } >>>>> -#if !REF_PAGES >>>>> page_cache_release(page); >>>>> -#endif >>>>> node->page[i] = page; >>>>> } >>>>> >>>>> @@ -612,13 +608,6 @@ void hfs_bnode_get(struct hfs_bnode *node) >>>>> { >>>>> if (node) { >>>>> atomic_inc(&node->refcnt); >>>>> -#if REF_PAGES >>>>> - { >>>>> - int i; >>>>> - for (i = 0; i < node->tree->pages_per_bnode; i++) >>>>> - get_page(node->page[i]); >>>>> - } >>>>> -#endif >>>>> dprint(DBG_BNODE_REFS, "get_node(%d:%d): %d\n", >>>>> node->tree->cnid, node->this, atomic_read(&node->refcnt)); >>>>> } >>>>> @@ -635,20 +624,12 @@ void hfs_bnode_put(struct hfs_bnode *node) >>>>> node->tree->cnid, node->this, atomic_read(&node->refcnt)); >>>>> if (!atomic_read(&node->refcnt)) >>>>> BUG(); >>>>> - if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) { >>>>> -#if REF_PAGES >>>>> - for (i = 0; i < tree->pages_per_bnode; i++) >>>>> - put_page(node->page[i]); >>>>> -#endif >>>>> + if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) >>>>> return; >>>>> - } >>>>> for (i = 0; i < tree->pages_per_bnode; i++) { >>>>> if (!node->page[i]) >>>>> continue; >>>>> mark_page_accessed(node->page[i]); >>>>> -#if REF_PAGES >>>>> - put_page(node->page[i]); >>>>> -#endif >>>>> } >>>>> >>>>> if (test_bit(HFS_BNODE_DELETED, &node->flags)) { >>>>> </commit> >>>>> >>>>> On 18 June 2015 at 00:41, Hin-Tak Leung <htl10@users.sourceforge.net> wrote: >>>>>> ------------------------------ >>>>>> On Tue, Jun 9, 2015 7:06 PM BST Anton Altaparmakov wrote: >>>>>> >>>>>>>Hi Andrew, >>>>>>> >>>>>>>Can you please take this patch up and get it merged into mainline? Despite Vyacheslav's lamentations this patch is obviously correct. __hfs_bnode_create() is called in two places from the driver and the pages it returned are kmap()-ed and used (both read and write) so it is quite obvious they cannot be page_cache_release()-d until after they are kunmap()-ed instead of before they are even kmap()-ed as happens without this patch and this patch fixes exactly this by moving the page_cache_release() calls to after the pages are kunmap()-ed. >>>>>>> >>>>>>>Feel free to add: >>>>>>> >>>>>>>Reviewed-by: Anton Altaparmakov <anton@tuxera.com> >>>>>>> >>>>>>>Thanks a lot in advance! >>>>>>> >>>>>>>Best regards, >>>>>>> >>>>>>> Anton >>>>>> >>>>>> I went around and looked at the code and Anton is right - __hfs_bnode_create() is called from hfs_bnode_create() and bfs_bnode_find(), and the pages are played with afterwards a bit further, so page_cache_release() at where the current location is, is wrong. >>>>>> >>>>>> Since I have a way of seeing a likely-related problem, I'll try testing this change in the next few days and possibly adding a Signed-off at the point. >>>>>> >>>>>> BTW, there is a similiar problem with the hfs (no plus) driver also; the "if 0" part is commented out with // c++ style comments instead. >>>>>> I'm looking over the pre-git history to see how that comes about. >>>>>> >>>>>> Hin-Tak >>>>>> >>>>>>>> On 7 Jun 2015, at 03:42, Sergei Antonov <saproj@gmail.com> wrote: >>>>>>>> >>>>>>>> Fix this bugreport by Sasha Levin: >>>>>>>> http://lkml.org/lkml/2015/2/20/85 ("use after free") >>>>>>>> Make sure mapped pages are available for the entire lifetime of hfs_bnode. >>>>>>>> >>>>>>>> Cc: Al Viro <viro@zeniv.linux.org.uk> >>>>>>>> Cc: Christoph Hellwig <hch@infradead.org> >>>>>>>> Cc: Andrew Morton <akpm@linux-foundation.org> >>>>>>>> Cc: Vyacheslav Dubeyko <slava@dubeyko.com> >>>>>>>> Cc: Hin-Tak Leung <htl10@users.sourceforge.net> >>>>>>>> Cc: Sougata Santra <sougata@tuxera.com> >>>>>>>> Reported-by: Sasha Levin <sasha.levin@oracle.com> >>>>>>>> Signed-off-by: Sergei Antonov <saproj@gmail.com> >>>>>>>> --- >>>>>>>> fs/hfsplus/bnode.c | 6 ++---- >>>>>>>> 1 file changed, 2 insertions(+), 4 deletions(-) >>>>>>>> >>>>>>>> diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c >>>>>>>> index 759708f..5af50fb 100644 >>>>>>>> --- a/fs/hfsplus/bnode.c >>>>>>>> +++ b/fs/hfsplus/bnode.c >>>>>>>> @@ -454,7 +454,6 @@ static struct hfs_bnode *__hfs_bnode_create(struct hfs_btree *tree, u32 cnid) >>>>>>>> page_cache_release(page); >>>>>>>> goto fail; >>>>>>>> } >>>>>>>> - page_cache_release(page); >>>>>>>> node->page[i] = page; >>>>>>>> } >>>>>>>> >>>>>>>> @@ -566,13 +565,12 @@ node_error: >>>>>>>> >>>>>>>> void hfs_bnode_free(struct hfs_bnode *node) >>>>>>>> { >>>>>>>> -#if 0 >>>>>>>> int i; >>>>>>>> >>>>>>>> - for (i = 0; i < node->tree->pages_per_bnode; i++) >>>>>>>> + for (i = 0; i < node->tree->pages_per_bnode; i++) { >>>>>>>> if (node->page[i]) >>>>>>>> page_cache_release(node->page[i]); >>>>>>>> -#endif >>>>>>>> + } >>>>>>>> kfree(node); >>>>>>>> } >>>>>>>> >>>>>>>> -- >>>>>>>> 2.3.0 >>>>>>>> >>>>>>>> -- >>>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in >>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>> >>>>>>>-- >>>>>>>Anton Altaparmakov <anton at tuxera.com> (replace at with @) >>>>>>>Lead in File System Development, Tuxera Inc., http://www.tuxera.com/ >>>>>>>Linux NTFS maintainer >>>>>>> >>>>>> -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 18 June 2015 at 21:51, Sergei Antonov <saproj@gmail.com> wrote: > On 18 June 2015 at 19:16, Hin-Tak Leung <hintak.leung@gmail.com> wrote: >> On 18 June 2015 at 17:19, Sergei Antonov <saproj@gmail.com> wrote: >>> On 18 June 2015 at 17:37, Hin-Tak Leung <hintak.leung@gmail.com> wrote: >>>> On 18 June 2015 at 13:58, Sergei Antonov <saproj@gmail.com> wrote: >>>>> On 18 June 2015 at 04:51, Hin-Tak Leung <hintak.leung@gmail.com> wrote: >>>>>> Hi Andrew and everybody else, >>>>>> >>>>>> I looked through the pre-git history and seem to have found the reason of how >>>>>> the current code had come to be, and why Sergei's fix seems to involve >>>>>> re-enable "#if 0"'ed out code. It is this - I think the code was wrong on entry, >>>>>> in that the first !REF_PAGES should be REF_PAGES. Then the "remove old >>>>>> debug code" >>>>>> commit below did not remove the right stuff. >>>>>> >>>>>> Looking at the old code, I think REF_PAGES may have started out being >>>>>> 1 (i.e. pages are released right after create, then get/put when needed, no >>>>>> need to release on bnode_free), >>>>>> then the code was modified for efficiency so that pages are not released >>>>>> on bnode_create and not put/get at bnode_put/get but release at bnode_free. >>>>>> But it was still largely working in the REF_PAGE=1 mode (and with the commented >>>>>> out release at bnode_free, which is supposed to be spanned by a !REF_PAGE). >>>>>> Then it was flipped over, and seems to work, and things were forgotten. >>>>>> >>>>>> I think the release at bnode_free was commented out because the person who >>>>>> wrote it wasn't sure - I am not sure either, since it seems like there might be >>>>>> other/more places that one might need to release the pages than just >>>>>> at bnode_free(). >>>>>> >>>>>> Does this train of thought make sense? >>>>> >>>>> No. The occurrence of !REF_PAGES was not a typo, it seems to me. Well, >>>>> at first it was my thought too. But later I considered what the logic >>>>> of the author could have been with REF_PAGES=0 and REF_PAGES=1 and >>>>> came to a conclusion that REF_PAGES=1 was a safe approach and >>>>> REF_PAGES=0 was an experimental unsafe release-early approach, so >>>>> !REF_PAGES was what he really meant. >>>>> >>>> >>>> You are wrong, it seems. I found even earlier versions of the code, >>>> with history, at >>>> http://sourceforge.net/projects/linux-hfsplus/ . >>>> The earlier version of the code >>>> (http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.1/) >>>> does get_page and put_page at bnode_get and bnode_put, while the later version >>>> (http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.2/ >>>> ) >>>> have those removed, dated 2001 december and 2002 june. >>> >>> Interesting. I only looked at historical linux git. >>> >>>> The change log says so: >>>> >>>> Date: Thu Jun 6 09:45:14 2002 +0000 >>>> >>>> Use buffer cache instead of page cache in bnode.c. Cache inode extents. >>>> >>>> The change also removed any page_cache_release() . It would seem that >>>> the additional REF_PAGES >>>> conditional was a fork/rewrite/clean-up which re-uses the earlier >>>> development logic, and eventually took over. >>>> >>>> The release-early code came first, not after. That makes sense, as >>>> releasing early >>> >>> We may be talking about different things. I did not write what code >>> came earlier and what after. Anyway, this becomes to difficult to >>> follow. My point was that in the variant without refs, calling >>> page_cache_release() right after to read_cache_page() was exactly what >>> the author wanted (although it is a bug). >>> >>> Again, we are probably arguing about different things. And I'm a >>> little less zealous than you to do this code archeology. >>> >> >> I would really prefer that you don't use personal descriptions such as "zealous" >> as well as making the kind of comments such as >> "The result of the test (which I'm sure will show the bug is gone) will >> be by far more valuable contribution than your other messages in the >> thread.". Those are not helpful. >> >> In my first reply I have already stated my reasoning: your fix consists >> mainly of removing an "if 0" and re-enabling seemingly dead code. So the >> dead code was written for a purpose which is now lost and forgotten. > > Of course. And it is not hard to guess that purpose quite reliably :). > The code used to release pages correctly, i. e. when freeing the bnode > structure. > >> And here is that purpose - early development of the code was done >> with getting and putting pages within each routine, for safeness, and migrating >> towards caching pages for a longer life cycle in 2001-2002. When the >> code was cleaned >> up and re-worked before putting into the kernel in 2005, this >> development process >> was repeated but with a REF_PAGES conditional and some additional code >> which was // commented out. >> >> The migration to caching pages >> would have been more or less correct, if REF_PAGES was flipped >> and the // commented out code was enabled. Except there was a mistake >> in one of the conditionals > > By mistake you mean !REF_PAGES in __hfs_bnode_create()? But it is not > a mistake under the conditions you provided above. With REF_PAGES > flipped to 1 and the // piece uncommented you don't want to release in > __hfs_bnode_create() because releasing is done in hfs_bnode_free(). > Yes, that's what I am saying. I believe the development of the code happens as follows: - bnode_get() does get_page(), bnode_put() does put_page(), and the rest of the routines - i.e. bnode_create() look up the pages but immediately release them. bnode_free() does nothing about pages. - then it was desired to keep the pages longer without the repeated get_pages and put_pages. so those are "#if REF_PAGES" out. And we no longer get and put pages on demand and should also stop releasing them after looking up at bnode_create() also. (the bnode_create() is a poor name for what it does, but never mind). New code is added to release at bnode_free(). The new code should have been spanned by the opposite, i.e. "#if !REF_PAGES", but was inserted as place holder with //. The rest of the story I have already tried to explain a couple of times. The switch from REF_PAGES=1 to =0 (i.e. from get_pages/put_pages per bnode_get/bnode_put routines, to keeping pages around longer between bnode_create/bnode_free) did not quite work correctly. For the others who want to see the version of the code with those REF_PAGES and want to weigh in on this, I cannot find a web interface for the repo, but it is the commit below in git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git commit 91556682e0bf004d98a529bf829d339abb98bbbd Author: Andrew Morton <akpm@osdl.org> Date: Wed Feb 25 16:17:48 2004 -0800 [PATCH] HFS+ support From: Roman Zippel <zippel@linux-m68k.org> This driver adds full read/write support for HFS+ and is based on the readonly driver by Brad Broyer. Thanks to Ethan Benson <erbenson@alaska.net> for a number of patches to make the driver more compliant with the spec. I would also repeat that earlier code and change history was at http://sourceforge.net/projects/linux-hfsplus/, and it showed the changes between 2001 dec to 2002 june from get_page/put_page per bnode_get/bnode_put to removing those and switching to a longer page life-cycle with the log message: "Use buffer cache instead of page cache in bnode.c. Cache inode extents." >>, and also the commented out code never got >> enabled. The commented out code was where a !REF_PAGES supposed to be. >> the rest should all be all plain "if REF_PAGES". The singular "if !REF_PAGES" >> was a mistake that has propagated. >> >> >>>> and read_cache_page on each bnode_get is the slower, safer but >>>> inefficient approach. >>>> It would be absurd to go from caching pages, then to 'experiment' to >>>> see if release-early + later read_cache_page >>>> on each bnode_get 'improves'. >>>> >>>>>> >>>>>> Hin-Tak >>>>>> >>>>>> <commit> >>>>>> commit a5e3985fa014029eb6795664c704953720cc7f7d >>>>>> Author: Roman Zippel <zippel@linux-m68k.org> >>>>>> Date: Tue Sep 6 15:18:47 2005 -0700 >>>>>> >>>>>> [PATCH] hfs: remove debug code >>>>>> >>>>>> This removes some old debug code, which is no longer needed. >>>>>> >>>>>> Signed-off-by: Roman Zippel <zippel@linux-m68k.org> >>>>>> Signed-off-by: Andrew Morton <akpm@osdl.org> >>>>>> Signed-off-by: Linus Torvalds <torvalds@osdl.org> >>>>>> >>>>>> diff --git a/fs/hfs/bnode.c b/fs/hfs/bnode.c >>>>>> index a096c5a..3d5cdc6 100644 >>>>>> --- a/fs/hfs/bnode.c >>>>>> +++ b/fs/hfs/bnode.c >>>>>> @@ -13,8 +13,6 @@ >>>>>> >>>>>> #include "btree.h" >>>>>> >>>>>> -#define REF_PAGES 0 >>>>>> - >>>>>> void hfs_bnode_read(struct hfs_bnode *node, void *buf, >>>>>> int off, int len) >>>>>> { >>>>>> @@ -289,9 +287,7 @@ static struct hfs_bnode *__hfs_bnode_create(struct >>>>>> hfs_btree *tree, u32 cnid) >>>>>> page_cache_release(page); >>>>>> goto fail; >>>>>> } >>>>>> -#if !REF_PAGES >>>>>> page_cache_release(page); >>>>>> -#endif >>>>>> node->page[i] = page; >>>>>> } >>>>>> >>>>>> @@ -449,13 +445,6 @@ void hfs_bnode_get(struct hfs_bnode *node) >>>>>> { >>>>>> if (node) { >>>>>> atomic_inc(&node->refcnt); >>>>>> -#if REF_PAGES >>>>>> - { >>>>>> - int i; >>>>>> - for (i = 0; i < node->tree->pages_per_bnode; i++) >>>>>> - get_page(node->page[i]); >>>>>> - } >>>>>> -#endif >>>>>> dprint(DBG_BNODE_REFS, "get_node(%d:%d): %d\n", >>>>>> node->tree->cnid, node->this, atomic_read(&node->refcnt)); >>>>>> } >>>>>> @@ -472,20 +461,12 @@ void hfs_bnode_put(struct hfs_bnode *node) >>>>>> node->tree->cnid, node->this, atomic_read(&node->refcnt)); >>>>>> if (!atomic_read(&node->refcnt)) >>>>>> BUG(); >>>>>> - if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) { >>>>>> -#if REF_PAGES >>>>>> - for (i = 0; i < tree->pages_per_bnode; i++) >>>>>> - put_page(node->page[i]); >>>>>> -#endif >>>>>> + if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) >>>>>> return; >>>>>> - } >>>>>> for (i = 0; i < tree->pages_per_bnode; i++) { >>>>>> if (!node->page[i]) >>>>>> continue; >>>>>> mark_page_accessed(node->page[i]); >>>>>> -#if REF_PAGES >>>>>> - put_page(node->page[i]); >>>>>> -#endif >>>>>> } >>>>>> >>>>>> if (test_bit(HFS_BNODE_DELETED, &node->flags)) { >>>>>> diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c >>>>>> index 8868d3b..b85abc6 100644 >>>>>> --- a/fs/hfsplus/bnode.c >>>>>> +++ b/fs/hfsplus/bnode.c >>>>>> @@ -18,8 +18,6 @@ >>>>>> #include "hfsplus_fs.h" >>>>>> #include "hfsplus_raw.h" >>>>>> >>>>>> -#define REF_PAGES 0 >>>>>> - >>>>>> /* Copy a specified range of bytes from the raw data of a node */ >>>>>> void hfs_bnode_read(struct hfs_bnode *node, void *buf, int off, int len) >>>>>> { >>>>>> @@ -450,9 +448,7 @@ static struct hfs_bnode *__hfs_bnode_create(struct >>>>>> hfs_btree *tree, u32 cnid) >>>>>> page_cache_release(page); >>>>>> goto fail; >>>>>> } >>>>>> -#if !REF_PAGES >>>>>> page_cache_release(page); >>>>>> -#endif >>>>>> node->page[i] = page; >>>>>> } >>>>>> >>>>>> @@ -612,13 +608,6 @@ void hfs_bnode_get(struct hfs_bnode *node) >>>>>> { >>>>>> if (node) { >>>>>> atomic_inc(&node->refcnt); >>>>>> -#if REF_PAGES >>>>>> - { >>>>>> - int i; >>>>>> - for (i = 0; i < node->tree->pages_per_bnode; i++) >>>>>> - get_page(node->page[i]); >>>>>> - } >>>>>> -#endif >>>>>> dprint(DBG_BNODE_REFS, "get_node(%d:%d): %d\n", >>>>>> node->tree->cnid, node->this, atomic_read(&node->refcnt)); >>>>>> } >>>>>> @@ -635,20 +624,12 @@ void hfs_bnode_put(struct hfs_bnode *node) >>>>>> node->tree->cnid, node->this, atomic_read(&node->refcnt)); >>>>>> if (!atomic_read(&node->refcnt)) >>>>>> BUG(); >>>>>> - if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) { >>>>>> -#if REF_PAGES >>>>>> - for (i = 0; i < tree->pages_per_bnode; i++) >>>>>> - put_page(node->page[i]); >>>>>> -#endif >>>>>> + if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) >>>>>> return; >>>>>> - } >>>>>> for (i = 0; i < tree->pages_per_bnode; i++) { >>>>>> if (!node->page[i]) >>>>>> continue; >>>>>> mark_page_accessed(node->page[i]); >>>>>> -#if REF_PAGES >>>>>> - put_page(node->page[i]); >>>>>> -#endif >>>>>> } >>>>>> >>>>>> if (test_bit(HFS_BNODE_DELETED, &node->flags)) { >>>>>> </commit> >>>>>> >>>>>> On 18 June 2015 at 00:41, Hin-Tak Leung <htl10@users.sourceforge.net> wrote: >>>>>>> ------------------------------ >>>>>>> On Tue, Jun 9, 2015 7:06 PM BST Anton Altaparmakov wrote: >>>>>>> >>>>>>>>Hi Andrew, >>>>>>>> >>>>>>>>Can you please take this patch up and get it merged into mainline? Despite Vyacheslav's lamentations this patch is obviously correct. __hfs_bnode_create() is called in two places from the driver and the pages it returned are kmap()-ed and used (both read and write) so it is quite obvious they cannot be page_cache_release()-d until after they are kunmap()-ed instead of before they are even kmap()-ed as happens without this patch and this patch fixes exactly this by moving the page_cache_release() calls to after the pages are kunmap()-ed. >>>>>>>> >>>>>>>>Feel free to add: >>>>>>>> >>>>>>>>Reviewed-by: Anton Altaparmakov <anton@tuxera.com> >>>>>>>> >>>>>>>>Thanks a lot in advance! >>>>>>>> >>>>>>>>Best regards, >>>>>>>> >>>>>>>> Anton >>>>>>> >>>>>>> I went around and looked at the code and Anton is right - __hfs_bnode_create() is called from hfs_bnode_create() and bfs_bnode_find(), and the pages are played with afterwards a bit further, so page_cache_release() at where the current location is, is wrong. >>>>>>> >>>>>>> Since I have a way of seeing a likely-related problem, I'll try testing this change in the next few days and possibly adding a Signed-off at the point. >>>>>>> >>>>>>> BTW, there is a similiar problem with the hfs (no plus) driver also; the "if 0" part is commented out with // c++ style comments instead. >>>>>>> I'm looking over the pre-git history to see how that comes about. >>>>>>> >>>>>>> Hin-Tak >>>>>>> >>>>>>>>> On 7 Jun 2015, at 03:42, Sergei Antonov <saproj@gmail.com> wrote: >>>>>>>>> >>>>>>>>> Fix this bugreport by Sasha Levin: >>>>>>>>> http://lkml.org/lkml/2015/2/20/85 ("use after free") >>>>>>>>> Make sure mapped pages are available for the entire lifetime of hfs_bnode. >>>>>>>>> >>>>>>>>> Cc: Al Viro <viro@zeniv.linux.org.uk> >>>>>>>>> Cc: Christoph Hellwig <hch@infradead.org> >>>>>>>>> Cc: Andrew Morton <akpm@linux-foundation.org> >>>>>>>>> Cc: Vyacheslav Dubeyko <slava@dubeyko.com> >>>>>>>>> Cc: Hin-Tak Leung <htl10@users.sourceforge.net> >>>>>>>>> Cc: Sougata Santra <sougata@tuxera.com> >>>>>>>>> Reported-by: Sasha Levin <sasha.levin@oracle.com> >>>>>>>>> Signed-off-by: Sergei Antonov <saproj@gmail.com> >>>>>>>>> --- >>>>>>>>> fs/hfsplus/bnode.c | 6 ++---- >>>>>>>>> 1 file changed, 2 insertions(+), 4 deletions(-) >>>>>>>>> >>>>>>>>> diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c >>>>>>>>> index 759708f..5af50fb 100644 >>>>>>>>> --- a/fs/hfsplus/bnode.c >>>>>>>>> +++ b/fs/hfsplus/bnode.c >>>>>>>>> @@ -454,7 +454,6 @@ static struct hfs_bnode *__hfs_bnode_create(struct hfs_btree *tree, u32 cnid) >>>>>>>>> page_cache_release(page); >>>>>>>>> goto fail; >>>>>>>>> } >>>>>>>>> - page_cache_release(page); >>>>>>>>> node->page[i] = page; >>>>>>>>> } >>>>>>>>> >>>>>>>>> @@ -566,13 +565,12 @@ node_error: >>>>>>>>> >>>>>>>>> void hfs_bnode_free(struct hfs_bnode *node) >>>>>>>>> { >>>>>>>>> -#if 0 >>>>>>>>> int i; >>>>>>>>> >>>>>>>>> - for (i = 0; i < node->tree->pages_per_bnode; i++) >>>>>>>>> + for (i = 0; i < node->tree->pages_per_bnode; i++) { >>>>>>>>> if (node->page[i]) >>>>>>>>> page_cache_release(node->page[i]); >>>>>>>>> -#endif >>>>>>>>> + } >>>>>>>>> kfree(node); >>>>>>>>> } >>>>>>>>> >>>>>>>>> -- >>>>>>>>> 2.3.0 >>>>>>>>> >>>>>>>>> -- >>>>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in >>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>> >>>>>>>>-- >>>>>>>>Anton Altaparmakov <anton at tuxera.com> (replace at with @) >>>>>>>>Lead in File System Development, Tuxera Inc., http://www.tuxera.com/ >>>>>>>>Linux NTFS maintainer >>>>>>>> >>>>>>> -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 19 June 2015 at 00:16, Hin-Tak Leung <hintak.leung@gmail.com> wrote: > On 18 June 2015 at 21:51, Sergei Antonov <saproj@gmail.com> wrote: >> On 18 June 2015 at 19:16, Hin-Tak Leung <hintak.leung@gmail.com> wrote: >>> On 18 June 2015 at 17:19, Sergei Antonov <saproj@gmail.com> wrote: >>>> On 18 June 2015 at 17:37, Hin-Tak Leung <hintak.leung@gmail.com> wrote: >>>>> On 18 June 2015 at 13:58, Sergei Antonov <saproj@gmail.com> wrote: >>>>>> On 18 June 2015 at 04:51, Hin-Tak Leung <hintak.leung@gmail.com> wrote: >>>>>>> Hi Andrew and everybody else, >>>>>>> >>>>>>> I looked through the pre-git history and seem to have found the reason of how >>>>>>> the current code had come to be, and why Sergei's fix seems to involve >>>>>>> re-enable "#if 0"'ed out code. It is this - I think the code was wrong on entry, >>>>>>> in that the first !REF_PAGES should be REF_PAGES. Then the "remove old >>>>>>> debug code" >>>>>>> commit below did not remove the right stuff. >>>>>>> >>>>>>> Looking at the old code, I think REF_PAGES may have started out being >>>>>>> 1 (i.e. pages are released right after create, then get/put when needed, no >>>>>>> need to release on bnode_free), >>>>>>> then the code was modified for efficiency so that pages are not released >>>>>>> on bnode_create and not put/get at bnode_put/get but release at bnode_free. >>>>>>> But it was still largely working in the REF_PAGE=1 mode (and with the commented >>>>>>> out release at bnode_free, which is supposed to be spanned by a !REF_PAGE). >>>>>>> Then it was flipped over, and seems to work, and things were forgotten. >>>>>>> >>>>>>> I think the release at bnode_free was commented out because the person who >>>>>>> wrote it wasn't sure - I am not sure either, since it seems like there might be >>>>>>> other/more places that one might need to release the pages than just >>>>>>> at bnode_free(). >>>>>>> >>>>>>> Does this train of thought make sense? >>>>>> >>>>>> No. The occurrence of !REF_PAGES was not a typo, it seems to me. Well, >>>>>> at first it was my thought too. But later I considered what the logic >>>>>> of the author could have been with REF_PAGES=0 and REF_PAGES=1 and >>>>>> came to a conclusion that REF_PAGES=1 was a safe approach and >>>>>> REF_PAGES=0 was an experimental unsafe release-early approach, so >>>>>> !REF_PAGES was what he really meant. >>>>>> >>>>> >>>>> You are wrong, it seems. I found even earlier versions of the code, >>>>> with history, at >>>>> http://sourceforge.net/projects/linux-hfsplus/ . >>>>> The earlier version of the code >>>>> (http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.1/) >>>>> does get_page and put_page at bnode_get and bnode_put, while the later version >>>>> (http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.2/ >>>>> ) >>>>> have those removed, dated 2001 december and 2002 june. >>>> >>>> Interesting. I only looked at historical linux git. >>>> >>>>> The change log says so: >>>>> >>>>> Date: Thu Jun 6 09:45:14 2002 +0000 >>>>> >>>>> Use buffer cache instead of page cache in bnode.c. Cache inode extents. >>>>> >>>>> The change also removed any page_cache_release() . It would seem that >>>>> the additional REF_PAGES >>>>> conditional was a fork/rewrite/clean-up which re-uses the earlier >>>>> development logic, and eventually took over. >>>>> >>>>> The release-early code came first, not after. That makes sense, as >>>>> releasing early >>>> >>>> We may be talking about different things. I did not write what code >>>> came earlier and what after. Anyway, this becomes to difficult to >>>> follow. My point was that in the variant without refs, calling >>>> page_cache_release() right after to read_cache_page() was exactly what >>>> the author wanted (although it is a bug). >>>> >>>> Again, we are probably arguing about different things. And I'm a >>>> little less zealous than you to do this code archeology. >>>> >>> >>> I would really prefer that you don't use personal descriptions such as "zealous" >>> as well as making the kind of comments such as >>> "The result of the test (which I'm sure will show the bug is gone) will >>> be by far more valuable contribution than your other messages in the >>> thread.". Those are not helpful. >>> >>> In my first reply I have already stated my reasoning: your fix consists >>> mainly of removing an "if 0" and re-enabling seemingly dead code. So the >>> dead code was written for a purpose which is now lost and forgotten. >> >> Of course. And it is not hard to guess that purpose quite reliably :). >> The code used to release pages correctly, i. e. when freeing the bnode >> structure. >> >>> And here is that purpose - early development of the code was done >>> with getting and putting pages within each routine, for safeness, and migrating >>> towards caching pages for a longer life cycle in 2001-2002. When the >>> code was cleaned >>> up and re-worked before putting into the kernel in 2005, this >>> development process >>> was repeated but with a REF_PAGES conditional and some additional code >>> which was // commented out. >>> >>> The migration to caching pages >>> would have been more or less correct, if REF_PAGES was flipped >>> and the // commented out code was enabled. Except there was a mistake >>> in one of the conditionals >> >> By mistake you mean !REF_PAGES in __hfs_bnode_create()? But it is not >> a mistake under the conditions you provided above. With REF_PAGES >> flipped to 1 and the // piece uncommented you don't want to release in >> __hfs_bnode_create() because releasing is done in hfs_bnode_free(). >> > > Yes, that's what I am saying. I believe the development of the code happens > as follows: > > - bnode_get() does get_page(), bnode_put() does put_page(), and the rest > of the routines - i.e. bnode_create() look up the pages but > immediately release them. bnode_free() > does nothing about pages. > > - then it was desired to keep the pages longer without the repeated > get_pages and put_pages. > so those are "#if REF_PAGES" out. And we no longer get and put pages on demand > and should also stop releasing them after looking up at bnode_create() also. > (the bnode_create() is a poor name for what it does, but never mind). New code > is added to release at bnode_free(). The new code should have been spanned > by the opposite, i.e. "#if !REF_PAGES", but was inserted as place > holder with //. > > The rest of the story I have already tried to explain a couple of > times. The switch > from REF_PAGES=1 to =0 (i.e. from get_pages/put_pages per > bnode_get/bnode_put routines, > to keeping pages around longer between bnode_create/bnode_free) did > not quite work correctly. > > For the others who want to see the version of the code with those REF_PAGES > and want to weigh in on this, > I cannot find a web interface for the repo, but it is the commit below > in git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git A web link: http://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/log/fs/hfsplus/bnode.c > commit 91556682e0bf004d98a529bf829d339abb98bbbd > Author: Andrew Morton <akpm@osdl.org> > Date: Wed Feb 25 16:17:48 2004 -0800 > > [PATCH] HFS+ support > > From: Roman Zippel <zippel@linux-m68k.org> > > This driver adds full read/write support for HFS+ and is based on the > readonly driver by Brad Broyer. > > Thanks to Ethan Benson <erbenson@alaska.net> for a number of patches to > make the driver more compliant with the spec. > > I would also repeat that earlier code and change history was at > http://sourceforge.net/projects/linux-hfsplus/, and it showed the > changes between > 2001 dec to 2002 june > > from get_page/put_page per bnode_get/bnode_put to removing those and > switching to a longer page life-cycle with the log message: > > "Use buffer cache instead of page cache in bnode.c. Cache inode extents." Alright. I did not correctly understand your original idea and write some irrelevant stuff. Maybe you were right. However, it is only a question of historical interest. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/hfs/bnode.c b/fs/hfs/bnode.c index a096c5a..3d5cdc6 100644 --- a/fs/hfs/bnode.c +++ b/fs/hfs/bnode.c @@ -13,8 +13,6 @@ #include "btree.h" -#define REF_PAGES 0 - void hfs_bnode_read(struct hfs_bnode *node, void *buf, int off, int len) { @@ -289,9 +287,7 @@ static struct hfs_bnode *__hfs_bnode_create(struct hfs_btree *tree, u32 cnid) page_cache_release(page); goto fail; } -#if !REF_PAGES page_cache_release(page); -#endif node->page[i] = page; } @@ -449,13 +445,6 @@ void hfs_bnode_get(struct hfs_bnode *node) { if (node) { atomic_inc(&node->refcnt); -#if REF_PAGES - { - int i; - for (i = 0; i < node->tree->pages_per_bnode; i++) - get_page(node->page[i]); - } -#endif dprint(DBG_BNODE_REFS, "get_node(%d:%d): %d\n", node->tree->cnid, node->this, atomic_read(&node->refcnt)); } @@ -472,20 +461,12 @@ void hfs_bnode_put(struct hfs_bnode *node) node->tree->cnid, node->this, atomic_read(&node->refcnt)); if (!atomic_read(&node->refcnt)) BUG(); - if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) { -#if REF_PAGES - for (i = 0; i < tree->pages_per_bnode; i++) - put_page(node->page[i]); -#endif + if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) return; - } for (i = 0; i < tree->pages_per_bnode; i++) { if (!node->page[i]) continue; mark_page_accessed(node->page[i]); -#if REF_PAGES - put_page(node->page[i]); -#endif } if (test_bit(HFS_BNODE_DELETED, &node->flags)) { diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c index 8868d3b..b85abc6 100644 --- a/fs/hfsplus/bnode.c +++ b/fs/hfsplus/bnode.c @@ -18,8 +18,6 @@ #include "hfsplus_fs.h" #include "hfsplus_raw.h" -#define REF_PAGES 0 - /* Copy a specified range of bytes from the raw data of a node */ void hfs_bnode_read(struct hfs_bnode *node, void *buf, int off, int len) { @@ -450,9 +448,7 @@ static struct hfs_bnode *__hfs_bnode_create(struct hfs_btree *tree, u32 cnid) page_cache_release(page); goto fail; } -#if !REF_PAGES page_cache_release(page); -#endif node->page[i] = page; }