diff mbox series

mm: remove an avoidable load of page refcount in page_ref_add_unless

Message ID 20241207082931.1707465-1-mjguzik@gmail.com (mailing list archive)
State New
Headers show
Series mm: remove an avoidable load of page refcount in page_ref_add_unless | expand

Commit Message

Mateusz Guzik Dec. 7, 2024, 8:29 a.m. UTC
Explicitly pre-checking the count adds nothing as atomic_add_unless
starts with doing the same thing. iow no functional changes.

disasm of stock filemap_get_read_batch from perf top while running
readseek2_processes -t 24:

  0.04 │ cb:   mov    0x34(%rbx),%eax           # first load
 73.11 │       test   %eax,%eax
       │     ↓ je     1bd
  0.09 │       mov    0x34(%rbx),%eax           # second load
  1.01 │ d9:   test   %eax,%eax
       │     ↓ je     1bd
  0.06 │       lea    0x1(%rax),%edx
  0.00 │       lea    0x34(%rbx),%r14
  0.00 │       lock   cmpxchg %edx,0x34(%rbx)
 14.06 │     ↑ jne    d9

Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
---

I did not bother benchmarking, I don't think there is anything
warranting it for this one. fwiw it plausibly is worth few % in a
microbenchmark at higher core count.

 include/linux/page_ref.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

David Hildenbrand Dec. 9, 2024, 9:28 a.m. UTC | #1
On 07.12.24 09:29, Mateusz Guzik wrote:
> Explicitly pre-checking the count adds nothing as atomic_add_unless
> starts with doing the same thing. iow no functional changes.

I recall that we added that check because with the hugetlb vmemmap 
optimization, some of the tail pages we don't ever expect to be modified 
  (because they are fake-duplicated) might be mapped R/O.

If the arch implementation of atomic_add_unless() would trigger an 
unconditional write fault, we'd be in trouble. That would likely only be 
the case if the arch provides a dedicate instruction.

atomic_add_unless()->raw_atomic_add_unless()

Nobody currently defines arch_atomic_add_unless().

raw_atomic_fetch_add_unless()->arch_atomic_fetch_add_unless() is defined 
on some architectures.

I scanned some of the inline-asm, and I think most of them perform a 
check first.


So this currently looks good to me, but we'll rely on the fact that 
atomic_add_unless() will never trigger a write fault if the values 
match. Which makes me wonder if we should document that behavior of 
atomic_add_unless().
Mateusz Guzik Dec. 9, 2024, 10:25 a.m. UTC | #2
On Mon, Dec 9, 2024 at 10:28 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 07.12.24 09:29, Mateusz Guzik wrote:
> > Explicitly pre-checking the count adds nothing as atomic_add_unless
> > starts with doing the same thing. iow no functional changes.
>
> I recall that we added that check because with the hugetlb vmemmap
> optimization, some of the tail pages we don't ever expect to be modified
>   (because they are fake-duplicated) might be mapped R/O.
>
> If the arch implementation of atomic_add_unless() would trigger an
> unconditional write fault, we'd be in trouble. That would likely only be
> the case if the arch provides a dedicate instruction.
>
> atomic_add_unless()->raw_atomic_add_unless()
>
> Nobody currently defines arch_atomic_add_unless().
>
> raw_atomic_fetch_add_unless()->arch_atomic_fetch_add_unless() is defined
> on some architectures.
>
> I scanned some of the inline-asm, and I think most of them perform a
> check first.
>

Huh.

Some arch triggering a write fault despite not changing the value is
not something I thought about. Sounds pretty broken to me if any arch
was to do it, but then stranger things did happen.

However, if this is seen as a real concern, then I think the best way
forward is to drop the patch (and maybe instead add a comment what's
up with the extra load).
David Hildenbrand Dec. 9, 2024, 10:56 a.m. UTC | #3
On 09.12.24 11:25, Mateusz Guzik wrote:
> On Mon, Dec 9, 2024 at 10:28 AM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 07.12.24 09:29, Mateusz Guzik wrote:
>>> Explicitly pre-checking the count adds nothing as atomic_add_unless
>>> starts with doing the same thing. iow no functional changes.
>>
>> I recall that we added that check because with the hugetlb vmemmap
>> optimization, some of the tail pages we don't ever expect to be modified
>>    (because they are fake-duplicated) might be mapped R/O.
>>
>> If the arch implementation of atomic_add_unless() would trigger an
>> unconditional write fault, we'd be in trouble. That would likely only be
>> the case if the arch provides a dedicate instruction.
>>
>> atomic_add_unless()->raw_atomic_add_unless()
>>
>> Nobody currently defines arch_atomic_add_unless().
>>
>> raw_atomic_fetch_add_unless()->arch_atomic_fetch_add_unless() is defined
>> on some architectures.
>>
>> I scanned some of the inline-asm, and I think most of them perform a
>> check first.
>>
> 
> Huh.
> 
> Some arch triggering a write fault despite not changing the value is
> not something I thought about. Sounds pretty broken to me if any arch
> was to do it, but then stranger things did happen.

Yeah, it really depends on what the architecture defines. For example, 
on s390x for "COMPARE AND SWAP" the spec states something like

"When the result of the comparison is unequal, the
second operand is loaded at the first-operand loca-
tion, and the second-operand location remains
unchanged. However, on some models, the contents
may be fetched and subsequently stored back
unchanged at the second-operand location. This
update appears to be a block-concurrent interlocked-
update reference as observed by other CPUs."

So there might be an unconditional store on an instruction where one 
would not expect it.


Something similar-but-different recently popped up on aarch64, which 
does what one would expect:

"The atomic RMW instructions, for example, ldadd, actually does load +
add + store in one instruction, it will trigger two page faults per the
ARM64 architecture spec, the first fault is a read fault, the second
fault is a write fault.

Some applications use atomic RMW instructions to populate memory, for
example, openjdk uses atomic-add-0 to do pretouch (populate heap memory
at launch time) between v18 and v22 in order to permit use of memory
concurrently with pretouch." [1]

And Christoph commented:

"x86 does not do a read fault on atomics so we have an issue htere." [2]

I did not check if that is actually true on x86.



> 
> However, if this is seen as a real concern, then I think the best way
> forward is to drop the patch (and maybe instead add a comment what's
> up with the extra load).

I assume we're currently fine because no architecture actually defines 
such an instruction that would be a problem for add_unless.


[1] https://lore.kernel.org/linux-arm-kernel/Zmw1jltdkMrTrT_l@arm.com/T/
[2] 
https://lore.kernel.org/linux-arm-kernel/c1ba9ba3-b0d6-4c6c-d628-614751d737c2@gentwo.org/
Mateusz Guzik Dec. 9, 2024, 12:33 p.m. UTC | #4
On Mon, Dec 9, 2024 at 11:56 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 09.12.24 11:25, Mateusz Guzik wrote:
> > On Mon, Dec 9, 2024 at 10:28 AM David Hildenbrand <david@redhat.com> wrote:
> >>
> >> On 07.12.24 09:29, Mateusz Guzik wrote:
> >>> Explicitly pre-checking the count adds nothing as atomic_add_unless
> >>> starts with doing the same thing. iow no functional changes.
> >>
> >> I recall that we added that check because with the hugetlb vmemmap
> >> optimization, some of the tail pages we don't ever expect to be modified
> >>    (because they are fake-duplicated) might be mapped R/O.
> >>
> >> If the arch implementation of atomic_add_unless() would trigger an
> >> unconditional write fault, we'd be in trouble. That would likely only be
> >> the case if the arch provides a dedicate instruction.
> >>
> >> atomic_add_unless()->raw_atomic_add_unless()
> >>
> >> Nobody currently defines arch_atomic_add_unless().
> >>
> >> raw_atomic_fetch_add_unless()->arch_atomic_fetch_add_unless() is defined
> >> on some architectures.
> >>
> >> I scanned some of the inline-asm, and I think most of them perform a
> >> check first.
> >>
> >
> > Huh.
> >
> > Some arch triggering a write fault despite not changing the value is
> > not something I thought about. Sounds pretty broken to me if any arch
> > was to do it, but then stranger things did happen.
>
> Yeah, it really depends on what the architecture defines. For example,
> on s390x for "COMPARE AND SWAP" the spec states something like
>
[snip]

Well in this context you need to do the initial load to even know what
to CAS with, unless you want to blindly do it hoping to get lucky,
which I'm assuming no arch is doing.

Granted, if there was an architecture which had an actual "cas unless
the value is x" then this would not hold, but I don't know of any.
[such an extension would be most welcome fwiw]

Assuming you indeed want the patch after all, can you sort out adding
a comment to atomic_add_unless yourself? ;) I presume you know the
right people and whatnot, so this would cut down on back and forth.

That is to say I think this thread just about exhausted the time
warranted by this patch. No hard feelz if it gets dropped, but then I
do strongly suggest adding a justification to the extra load.

Cheers,
David Hildenbrand Dec. 9, 2024, 2:22 p.m. UTC | #5
On 09.12.24 13:33, Mateusz Guzik wrote:
> On Mon, Dec 9, 2024 at 11:56 AM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 09.12.24 11:25, Mateusz Guzik wrote:
>>> On Mon, Dec 9, 2024 at 10:28 AM David Hildenbrand <david@redhat.com> wrote:
>>>>
>>>> On 07.12.24 09:29, Mateusz Guzik wrote:
>>>>> Explicitly pre-checking the count adds nothing as atomic_add_unless
>>>>> starts with doing the same thing. iow no functional changes.
>>>>
>>>> I recall that we added that check because with the hugetlb vmemmap
>>>> optimization, some of the tail pages we don't ever expect to be modified
>>>>     (because they are fake-duplicated) might be mapped R/O.
>>>>
>>>> If the arch implementation of atomic_add_unless() would trigger an
>>>> unconditional write fault, we'd be in trouble. That would likely only be
>>>> the case if the arch provides a dedicate instruction.
>>>>
>>>> atomic_add_unless()->raw_atomic_add_unless()
>>>>
>>>> Nobody currently defines arch_atomic_add_unless().
>>>>
>>>> raw_atomic_fetch_add_unless()->arch_atomic_fetch_add_unless() is defined
>>>> on some architectures.
>>>>
>>>> I scanned some of the inline-asm, and I think most of them perform a
>>>> check first.
>>>>
>>>
>>> Huh.
>>>
>>> Some arch triggering a write fault despite not changing the value is
>>> not something I thought about. Sounds pretty broken to me if any arch
>>> was to do it, but then stranger things did happen.
>>
>> Yeah, it really depends on what the architecture defines. For example,
>> on s390x for "COMPARE AND SWAP" the spec states something like
>>
> [snip]
> 
> Well in this context you need to do the initial load to even know what
> to CAS with, unless you want to blindly do it hoping to get lucky,
> which I'm assuming no arch is doing.
> 
> Granted, if there was an architecture which had an actual "cas unless
> the value is x" then this would not hold, but I don't know of any.
> [such an extension would be most welcome fwiw]

Apparently, we prepared for that via arch_atomic_add_unless(), which has no users.

> 
> Assuming you indeed want the patch after all, can you sort out adding
> a comment to atomic_add_unless yourself? ;) I presume you know the
> right people and whatnot, so this would cut down on back and forth.
> 
> That is to say I think this thread just about exhausted the time
> warranted by this patch. No hard feelz if it gets dropped, but then I
> do strongly suggest adding a justification to the extra load.

Maybe it's sufficient for now to simply do your change with a comment:

diff --git a/include/linux/page_ref.h b/include/linux/page_ref.h
index 8c236c651d1d6..1efc992ad5687 100644
--- a/include/linux/page_ref.h
+++ b/include/linux/page_ref.h
@@ -234,7 +234,13 @@ static inline bool page_ref_add_unless(struct page *page, int nr, int u)
  
         rcu_read_lock();
         /* avoid writing to the vmemmap area being remapped */
-       if (!page_is_fake_head(page) && page_ref_count(page) != u)
+       if (!page_is_fake_head(page))
+               /*
+                * atomic_add_unless() will currently never modify the value
+                * if it already is u. If that ever changes, we'd have to have
+                * a separate check here, such that we won't be writing to
+                * write-protected vmemmap areas.
+                */
                 ret = atomic_add_unless(&page->_refcount, nr, u);
         rcu_read_unlock();
  

It would bail out during testing ... hopefully, such that we can detect any such change.
Mateusz Guzik Dec. 9, 2024, 2:30 p.m. UTC | #6
On Mon, Dec 9, 2024 at 3:22 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 09.12.24 13:33, Mateusz Guzik wrote:
> > That is to say I think this thread just about exhausted the time
> > warranted by this patch. No hard feelz if it gets dropped, but then I
> > do strongly suggest adding a justification to the extra load.
>
> Maybe it's sufficient for now to simply do your change with a comment:
>
> diff --git a/include/linux/page_ref.h b/include/linux/page_ref.h
> index 8c236c651d1d6..1efc992ad5687 100644
> --- a/include/linux/page_ref.h
> +++ b/include/linux/page_ref.h
> @@ -234,7 +234,13 @@ static inline bool page_ref_add_unless(struct page *page, int nr, int u)
>
>          rcu_read_lock();
>          /* avoid writing to the vmemmap area being remapped */
> -       if (!page_is_fake_head(page) && page_ref_count(page) != u)
> +       if (!page_is_fake_head(page))
> +               /*
> +                * atomic_add_unless() will currently never modify the value
> +                * if it already is u. If that ever changes, we'd have to have
> +                * a separate check here, such that we won't be writing to
> +                * write-protected vmemmap areas.
> +                */
>                  ret = atomic_add_unless(&page->_refcount, nr, u);
>          rcu_read_unlock();
>
>
> It would bail out during testing ... hopefully, such that we can detect any such change.
>

Not my call to make, but looks good. ;)

fwiw I don't need any credit and I would be more than happy if you
just submitted the thing as your own without me being mentioned. *No*
cc would also be appreciated.
David Hildenbrand Dec. 9, 2024, 2:46 p.m. UTC | #7
On 09.12.24 15:30, Mateusz Guzik wrote:
> On Mon, Dec 9, 2024 at 3:22 PM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 09.12.24 13:33, Mateusz Guzik wrote:
>>> That is to say I think this thread just about exhausted the time
>>> warranted by this patch. No hard feelz if it gets dropped, but then I
>>> do strongly suggest adding a justification to the extra load.
>>
>> Maybe it's sufficient for now to simply do your change with a comment:
>>
>> diff --git a/include/linux/page_ref.h b/include/linux/page_ref.h
>> index 8c236c651d1d6..1efc992ad5687 100644
>> --- a/include/linux/page_ref.h
>> +++ b/include/linux/page_ref.h
>> @@ -234,7 +234,13 @@ static inline bool page_ref_add_unless(struct page *page, int nr, int u)
>>
>>           rcu_read_lock();
>>           /* avoid writing to the vmemmap area being remapped */
>> -       if (!page_is_fake_head(page) && page_ref_count(page) != u)
>> +       if (!page_is_fake_head(page))
>> +               /*
>> +                * atomic_add_unless() will currently never modify the value
>> +                * if it already is u. If that ever changes, we'd have to have
>> +                * a separate check here, such that we won't be writing to
>> +                * write-protected vmemmap areas.
>> +                */
>>                   ret = atomic_add_unless(&page->_refcount, nr, u);
>>           rcu_read_unlock();
>>
>>
>> It would bail out during testing ... hopefully, such that we can detect any such change.
>>
> 
> Not my call to make, but looks good. ;)
> 
> fwiw I don't need any credit and I would be more than happy if you
> just submitted the thing as your own without me being mentioned. *No*
> cc would also be appreciated.

Likely Andrew can add the comment as a fixup.
diff mbox series

Patch

diff --git a/include/linux/page_ref.h b/include/linux/page_ref.h
index 8c236c651d1d..fa203894876f 100644
--- a/include/linux/page_ref.h
+++ b/include/linux/page_ref.h
@@ -234,7 +234,7 @@  static inline bool page_ref_add_unless(struct page *page, int nr, int u)
 
 	rcu_read_lock();
 	/* avoid writing to the vmemmap area being remapped */
-	if (!page_is_fake_head(page) && page_ref_count(page) != u)
+	if (!page_is_fake_head(page))
 		ret = atomic_add_unless(&page->_refcount, nr, u);
 	rcu_read_unlock();