mbox series

[0/3] hugetlb: fix potential ref counting races

Message ID 20210710002441.167759-1-mike.kravetz@oracle.com (mailing list archive)
Headers show
Series hugetlb: fix potential ref counting races | expand

Message

Mike Kravetz July 10, 2021, 12:24 a.m. UTC
When Muchun Song brought up a potential issue with hugetlb ref counting[1],
I started looking closer at the code.  hugetlbfs is the only code with it's
own specialized compound page destructor and taking special action when ref
counts drop to zero.  Potential races happen in this unique handling of ref
counts.  The following patches address these races when creating and
destroying hugetlb pages.

These potential races have likely existed since the creation of
hugetlbfs.  They certainly have been around for more than 10 years.
However, I am unaware of anyone actually hitting these races.  It is
VERY unlikely than anyone will actually hit these races, but they do
exist.

I could not think of an easy (or difficult) way to force these races.
Therefore, testing consisted of adding code to randomly increase ref
counts in strategic places.  In this way, I was able to exercise all the
race handling code paths.

[1] https://lore.kernel.org/linux-mm/CAMZfGtVMn3daKrJwZMaVOGOaJU+B4dS--x_oPmGQMD=c=QNGEg@mail.gmail.com/

Mike Kravetz (3):
  hugetlb: simplify prep_compound_gigantic_page ref count racing code
  hugetlb: drop ref count earlier after page allocation
  hugetlb: before freeing hugetlb page set dtor to appropriate value

 mm/hugetlb.c | 137 ++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 104 insertions(+), 33 deletions(-)

Comments

Mike Kravetz July 27, 2021, 11:41 p.m. UTC | #1
Any additional comments on these patches/this approach?

The first patch addressing this issue actually went into the 5.14 merge
window as commit 7118fc2906e2 ("hugetlb: address ref count racing in
prep_compound_gigantic_page").

All this code is very tricky and subtle.  It addresses potential issues
discovered by code analysis.  I do not believe the races have ever been
experienced in practice.  If anyone has suggestions for a simpler or
alternative approach, I would love to hear them.
Muchun Song July 28, 2021, 4:03 a.m. UTC | #2
On Wed, Jul 28, 2021 at 7:41 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
> Any additional comments on these patches/this approach?
>
> The first patch addressing this issue actually went into the 5.14 merge
> window as commit 7118fc2906e2 ("hugetlb: address ref count racing in
> prep_compound_gigantic_page").
>
> All this code is very tricky and subtle.  It addresses potential issues
> discovered by code analysis.  I do not believe the races have ever been
> experienced in practice.  If anyone has suggestions for a simpler or
> alternative approach, I would love to hear them.

Hi Mike,

I agree with you that this code is very tricky and subtle. I have looked
at this patch set. For me, I cannot figure out a better solution for this
race.

--
Thanks,
Muchun

> --
> Mike Kravetz
>
>
> On 7/9/21 5:24 PM, Mike Kravetz wrote:
> > When Muchun Song brought up a potential issue with hugetlb ref counting[1],
> > I started looking closer at the code.  hugetlbfs is the only code with it's
> > own specialized compound page destructor and taking special action when ref
> > counts drop to zero.  Potential races happen in this unique handling of ref
> > counts.  The following patches address these races when creating and
> > destroying hugetlb pages.
> >
> > These potential races have likely existed since the creation of
> > hugetlbfs.  They certainly have been around for more than 10 years.
> > However, I am unaware of anyone actually hitting these races.  It is
> > VERY unlikely than anyone will actually hit these races, but they do
> > exist.
> >
> > I could not think of an easy (or difficult) way to force these races.
> > Therefore, testing consisted of adding code to randomly increase ref
> > counts in strategic places.  In this way, I was able to exercise all the
> > race handling code paths.
> >
> > [1] https://lore.kernel.org/linux-mm/CAMZfGtVMn3daKrJwZMaVOGOaJU+B4dS--x_oPmGQMD=c=QNGEg@mail.gmail.com/
> >
> > Mike Kravetz (3):
> >   hugetlb: simplify prep_compound_gigantic_page ref count racing code
> >   hugetlb: drop ref count earlier after page allocation
> >   hugetlb: before freeing hugetlb page set dtor to appropriate value
> >
> >  mm/hugetlb.c | 137 ++++++++++++++++++++++++++++++++++++++-------------
> >  1 file changed, 104 insertions(+), 33 deletions(-)
> >