mbox series

[GIT,PULL] VFIO fix for v6.0-rc5

Message ID 20220909045225.3a572a57.alex.williamson@redhat.com (mailing list archive)
State New, archived
Headers show
Series [GIT,PULL] VFIO fix for v6.0-rc5 | expand

Pull-request

https://github.com/awilliam/linux-vfio.git tags/vfio-v6.0-rc5

Message

Alex Williamson Sept. 9, 2022, 10:52 a.m. UTC
Hi Linus,

The following changes since commit b90cb1053190353cc30f0fef0ef1f378ccc063c5:

  Linux 6.0-rc3 (2022-08-28 15:05:29 -0700)

are available in the Git repository at:

  https://github.com/awilliam/linux-vfio.git tags/vfio-v6.0-rc5

for you to fetch changes up to 873aefb376bbc0ed1dd2381ea1d6ec88106fdbd4:

  vfio/type1: Unpin zero pages (2022-08-31 08:57:30 -0600)

----------------------------------------------------------------
VFIO fix for v6.0-rc5

 - Fix zero page refcount leak (Alex Williamson)

----------------------------------------------------------------
Alex Williamson (1):
      vfio/type1: Unpin zero pages

 drivers/vfio/vfio_iommu_type1.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

Comments

Linus Torvalds Sept. 9, 2022, 11:53 a.m. UTC | #1
On Fri, Sep 9, 2022 at 6:52 AM Alex Williamson
<alex.williamson@redhat.com> wrote:
>
> VFIO fix for v6.0-rc5
>
>  - Fix zero page refcount leak (Alex Williamson)

Ugh. This is disgusting.

Don't get me wrong - I've pulled this, but I think there's some deeper
problem that made this patch required.

Why is pin_user_pages_remote() taking a reference to a reserved page?
Maybe it just shouldn't (and then obviously we should fix the unpin
case to match too).

Adding a few GUP people to the participants for comments.

Anybody?

            Linus
pr-tracker-bot@kernel.org Sept. 9, 2022, 11:59 a.m. UTC | #2
The pull request you sent on Fri, 9 Sep 2022 04:52:25 -0600:

> https://github.com/awilliam/linux-vfio.git tags/vfio-v6.0-rc5

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/725f3f3b2708d8f3fe56df8113bfdc7380d52dc9

Thank you!
David Hildenbrand Sept. 9, 2022, 12:02 p.m. UTC | #3
On 09.09.22 13:53, Linus Torvalds wrote:
> On Fri, Sep 9, 2022 at 6:52 AM Alex Williamson
> <alex.williamson@redhat.com> wrote:
>>
>> VFIO fix for v6.0-rc5
>>
>>   - Fix zero page refcount leak (Alex Williamson)
> 
> Ugh. This is disgusting.
> 
> Don't get me wrong - I've pulled this, but I think there's some deeper
> problem that made this patch required.
> 
> Why is pin_user_pages_remote() taking a reference to a reserved page?
> Maybe it just shouldn't (and then obviously we should fix the unpin
> case to match too).
> 
> Adding a few GUP people to the participants for comments.
> 
> Anybody?

I mentioned in an offline discussion to Alex that we should teach the 
pin/unpin interface to not mess with the zeropage at all (i.e., not 
adjust the refcount and eventually overflow it).

We decided that the unbalanced pin/unpin should be fixed independently, 
such that the refcount handling change on pin/unpin stays GUP internal.
Alex Williamson Sept. 9, 2022, 12:08 p.m. UTC | #4
On Fri, 9 Sep 2022 07:53:17 -0400
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Fri, Sep 9, 2022 at 6:52 AM Alex Williamson
> <alex.williamson@redhat.com> wrote:
> >
> > VFIO fix for v6.0-rc5
> >
> >  - Fix zero page refcount leak (Alex Williamson)  
> 
> Ugh. This is disgusting.
> 
> Don't get me wrong - I've pulled this, but I think there's some deeper
> problem that made this patch required.
> 
> Why is pin_user_pages_remote() taking a reference to a reserved page?
> Maybe it just shouldn't (and then obviously we should fix the unpin
> case to match too).
> 
> Adding a few GUP people to the participants for comments.
> 
> Anybody?

Yes, David is working on allocating pages rather than pinning the zero
page, however that's going to have some user visible locked memory
accounting changes.  This isn't the long term solution, it's only meant
to close the shared zero page refcount holes we have currently.  Thanks,

Alex
Jason Gunthorpe Sept. 9, 2022, 2:31 p.m. UTC | #5
On Fri, Sep 09, 2022 at 06:08:32AM -0600, Alex Williamson wrote:
> On Fri, 9 Sep 2022 07:53:17 -0400
> Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
> > On Fri, Sep 9, 2022 at 6:52 AM Alex Williamson
> > <alex.williamson@redhat.com> wrote:
> > >
> > > VFIO fix for v6.0-rc5
> > >
> > >  - Fix zero page refcount leak (Alex Williamson)  
> > 
> > Ugh. This is disgusting.
> > 
> > Don't get me wrong - I've pulled this, but I think there's some deeper
> > problem that made this patch required.

The basic issue is that VFIO is using the reserved page as an
indication that pin_user_pages_remote() didn't process the page and
instead VFIO's bad follow_pfn() path was used.

It really shouldn't be working like this, which path was used to
obtain the PFN should be recorded independently. VFIO just doesn't
have a datastructure to accommodate it.

Aside from that, most longterm users don't ever even see the zero page
because that typically creates that GUP incoherence we've talked about
alot. The ugly FOLL_FORCE prevents it from happening. VFIO doesn't do
this either so it suffers the incoherence bug along with exposure to
the zero pages :(

David will fix the FOLL_FORCE issue and this will almost go away.

We had a long thread about this and discussed fixing it by simply
setting FOLL_FORCE as other pin_user_page users do. This reached an
interesting point that I'm curious on your view about ABI stability.

It is related to mlock accounting. When vfio pins pages it accounts
for them in mlock. Alex says there are users that set a mlock
limit. VFIO has this historical bug where it doesn't account for every
page it is asked to pin in mlock, eg it ignores the zero page. So, if
we change how GUP or VFIO does page pinning in a way that increases
its mlock charge is this an ABI break?

On one hand users with an very strict limit may find their environment
requires a limit increase after a kernel upgrade. So it fails the
'everything keeps working after kernel upgrade' test.

On the other hand, treating these sorts of limits as ABI means that a
lot of internal stuff is suddenly ABI - eg with a memory cgroup is
increasing the charge on GFP_KERNEL_ACCOUNT also ABI breaking? That
seems unworkable.

Thanks,
Jason
John Hubbard Sept. 9, 2022, 6:58 p.m. UTC | #6
On 9/9/22 05:02, David Hildenbrand wrote:
> On 09.09.22 13:53, Linus Torvalds wrote:
>> On Fri, Sep 9, 2022 at 6:52 AM Alex Williamson
>> <alex.williamson@redhat.com> wrote:
>>>
>>> VFIO fix for v6.0-rc5
>>>
>>>   - Fix zero page refcount leak (Alex Williamson)
>>
>> Ugh. This is disgusting.
>>
>> Don't get me wrong - I've pulled this, but I think there's some deeper
>> problem that made this patch required.
>>
>> Why is pin_user_pages_remote() taking a reference to a reserved page?
>> Maybe it just shouldn't (and then obviously we should fix the unpin
>> case to match too).
>>
>> Adding a few GUP people to the participants for comments.
>>
>> Anybody?
> 
> I mentioned in an offline discussion to Alex that we should teach the 
> pin/unpin interface to not mess with the zeropage at all (i.e., not 
> adjust the refcount and eventually overflow it).

I don't think this is part of the problem. And I sure hope that it's
not. If you can make such a change, and contain it within the gup.c
layer so that callers can still think they are pinning the zero page,
then OK, that works.

But generally, callers expect to be able to pin the zero page, or to at
least believe that they've done so. :)  Sorting it out and treating it
separately requires larger changes to probably quite a lot of subsystems.

As a case in point, very close to my heart, I'm about to add a
pin_user_page*() caller to block/bio, that pins the zero page, after
discussing it with block/fs people [1]. 

> 
> We decided that the unbalanced pin/unpin should be fixed independently, 
> such that the refcount handling change on pin/unpin stays GUP internal.
> 
> 

OK, good.

[1] https://lore.kernel.org/all/20220124221709.kzsaqkdp3gmjie3z@quack3.lan/

thanks,