[v2,0/2] A couple hugetlbfs fixes

Message ID	20190328234704.27083-1-mike.kravetz@oracle.com (mailing list archive)
Headers	show Return-Path: <owner-linux-mm@kvack.org> Received-SPF: pass (google.com: domain of mike.kravetz@oracle.com designates 156.151.31.85 as permitted sender) client-ip=156.151.31.85; From: Mike Kravetz <mike.kravetz@oracle.com> To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Davidlohr Bueso <dave@stgolabs.net> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>, Michal Hocko <mhocko@kernel.org>, Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>, "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>, Andrew Morton <akpm@linux-foundation.org>, Mike Kravetz <mike.kravetz@oracle.com> Subject: [PATCH v2 0/2] A couple hugetlbfs fixes Date: Thu, 28 Mar 2019 16:47:02 -0700 Message-Id: <20190328234704.27083-1-mike.kravetz@oracle.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	A couple hugetlbfs fixes \| expand [v2,0/2] A couple hugetlbfs fixes [v2,1/2] huegtlbfs: on restore reserve error path retain subpool reservation [v2,2/2] hugetlb: use same fault hash key for shared and private mappings

Message ID

20190328234704.27083-1-mike.kravetz@oracle.com (mailing list archive)

Headers

Received-SPF: pass (google.com: domain of mike.kravetz@oracle.com designates
 156.151.31.85 as permitted sender) client-ip=156.151.31.85;
From: Mike Kravetz <mike.kravetz@oracle.com>
To: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
        Davidlohr Bueso <dave@stgolabs.net>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>, Michal Hocko <mhocko@kernel.org>,
        Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
        "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
        Andrew Morton <akpm@linux-foundation.org>,
        Mike Kravetz <mike.kravetz@oracle.com>
Subject: [PATCH v2 0/2] A couple hugetlbfs fixes
Date: Thu, 28 Mar 2019 16:47:02 -0700
Message-Id: <20190328234704.27083-1-mike.kravetz@oracle.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Sender: owner-linux-mm@kvack.org
Precedence: bulk

Series

A couple hugetlbfs fixes | expand

Message

Mike Kravetz March 28, 2019, 11:47 p.m. UTC

I stumbled on these two hugetlbfs issues while looking at other things:
- The 'restore reserve' functionality at page free time should not
  be adjusting subpool counts.
- A BUG can be triggered (not easily) due to temporarily mapping a
  page before doing a COW.

Both are described in detail in the commit message of the patches.
I would appreciate comments from Davidlohr Bueso as one patch is
directly related to code he added in commit 8382d914ebf7.

I did not cc stable as the first problem has been around since reserves
were added to hugetlbfs and nobody has noticed.  The second is very hard
to hit/reproduce.

v2 - Update definition and all callers of hugetlb_fault_mutex_hash as
     the arguments mm and vma are no longer used or necessary.

Mike Kravetz (2):
  huegtlbfs: on restore reserve error path retain subpool reservation
  hugetlb: use same fault hash key for shared and private mappings

 fs/hugetlbfs/inode.c    |  7 ++-----
 include/linux/hugetlb.h |  4 +---
 mm/hugetlb.c            | 43 +++++++++++++++++++++--------------------
 mm/userfaultfd.c        |  3 +--
 4 files changed, 26 insertions(+), 31 deletions(-)

Comments

Naoya Horiguchi April 3, 2019, 5:29 a.m. UTC | #1

On Thu, Mar 28, 2019 at 04:47:02PM -0700, Mike Kravetz wrote:
> I stumbled on these two hugetlbfs issues while looking at other things:
> - The 'restore reserve' functionality at page free time should not
>   be adjusting subpool counts.
> - A BUG can be triggered (not easily) due to temporarily mapping a
>   page before doing a COW.
> 
> Both are described in detail in the commit message of the patches.
> I would appreciate comments from Davidlohr Bueso as one patch is
> directly related to code he added in commit 8382d914ebf7.
> 
> I did not cc stable as the first problem has been around since reserves
> were added to hugetlbfs and nobody has noticed.  The second is very hard
> to hit/reproduce.
> 
> v2 - Update definition and all callers of hugetlb_fault_mutex_hash as
>      the arguments mm and vma are no longer used or necessary.
> 
> Mike Kravetz (2):
>   huegtlbfs: on restore reserve error path retain subpool reservation
>   hugetlb: use same fault hash key for shared and private mappings
> 
>  fs/hugetlbfs/inode.c    |  7 ++-----
>  include/linux/hugetlb.h |  4 +---
>  mm/hugetlb.c            | 43 +++++++++++++++++++++--------------------
>  mm/userfaultfd.c        |  3 +--
>  4 files changed, 26 insertions(+), 31 deletions(-)

Both fixes look fine to me.

Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

Davidlohr Bueso April 8, 2019, 7:48 p.m. UTC | #2

On Thu, 28 Mar 2019, Mike Kravetz wrote:

>- A BUG can be triggered (not easily) due to temporarily mapping a
>  page before doing a COW.

But you actually _have_ seen it? Do you have the traces? I ask
not because of the patches perse, but because it would be nice
to have a real snipplet in the Changelog for patch 2.

Thanks,
Davidlohr

Mike Kravetz April 9, 2019, 3:30 a.m. UTC | #3

On 4/8/19 12:48 PM, Davidlohr Bueso wrote:
> On Thu, 28 Mar 2019, Mike Kravetz wrote:
> 
>> - A BUG can be triggered (not easily) due to temporarily mapping a
>>  page before doing a COW.
> 
> But you actually _have_ seen it? Do you have the traces? I ask
> not because of the patches perse, but because it would be nice
> to have a real snipplet in the Changelog for patch 2.

Yes, I actually saw this problem.  It happened while I was debugging and
testing some patches for hugetlb migration.  The BUG I hit was in
unaccount_page_cache_page(): VM_BUG_ON_PAGE(page_mapped(page), page).

Stack trace was something like:
unaccount_page_cache_page
  __delete_from_page_cache
    delete_from_page_cache
      remove_huge_page
        remove_inode_hugepages
          hugetlbfs_punch_hole
            hugetlbfs_fallocate

When I hit that, it took me a while to figure out how it could happen.
i.e. How could a page be mapped at that point in remove_inode_hugepages?
It checks page_mapped and we are holding the fault mutex.  With some
additional debug code (strategic udelays) I could hit the issue on a
somewhat regular basis and verified another thread was in the
hugetlb_no_page/hugetlb_cow path for the same page at the same time.

Unfortunately, I did not save the traces.  I am trying to recreate now.
However, my test system was recently updated and it might take a little
time to recreate.