diff mbox series

[-next] mm/filemap: fix that first page is not mark accessed in filemap_read()

Message ID 20220602082129.2805890-1-yukuai3@huawei.com (mailing list archive)
State New
Headers show
Series [-next] mm/filemap: fix that first page is not mark accessed in filemap_read() | expand

Commit Message

Yu Kuai June 2, 2022, 8:21 a.m. UTC
In filemap_read(), 'ra->prev_pos' is set to 'iocb->ki_pos + copied',
while it should be 'iocb->ki_ops'. For consequence,
folio_mark_accessed() will not be called for 'fbatch.folios[0]' since
'iocb->ki_pos' is always equal to 'ra->prev_pos'.

Fixes: 06c0444290ce ("mm/filemap.c: generic_file_buffered_read() now uses find_get_pages_contig")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 mm/filemap.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

Comments

Andrew Morton June 2, 2022, 6:22 p.m. UTC | #1
On Thu, 2 Jun 2022 16:21:29 +0800 Yu Kuai <yukuai3@huawei.com> wrote:

> In filemap_read(), 'ra->prev_pos' is set to 'iocb->ki_pos + copied',
> while it should be 'iocb->ki_ops'. For consequence,
> folio_mark_accessed() will not be called for 'fbatch.folios[0]' since
> 'iocb->ki_pos' is always equal to 'ra->prev_pos'.
> 
> ...
>
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -2728,10 +2728,11 @@ ssize_t filemap_read(struct kiocb *iocb, struct iov_iter *iter,
>  				flush_dcache_folio(folio);
>  
>  			copied = copy_folio_to_iter(folio, offset, bytes, iter);
> -
> -			already_read += copied;
> -			iocb->ki_pos += copied;
> -			ra->prev_pos = iocb->ki_pos;
> +			if (copied) {
> +				ra->prev_pos = iocb->ki_pos;
> +				already_read += copied;
> +				iocb->ki_pos += copied;
> +			}
>  
>  			if (copied < bytes) {
>  				error = -EFAULT;

It seems tidier, but does it matter?  If copied==0 we're going to break
out and return -EFAULT anyway?
Matthew Wilcox June 2, 2022, 6:30 p.m. UTC | #2
On Thu, Jun 02, 2022 at 04:21:29PM +0800, Yu Kuai wrote:
> In filemap_read(), 'ra->prev_pos' is set to 'iocb->ki_pos + copied',
> while it should be 'iocb->ki_ops'.

Can you walk me through your reasoning which leads you to believe that
it should be ki_pos instead of ki_pos + copied?  As I understand it,
prev_pos is the end of the previous read, not the beginning of the
previous read.

For consequence,
> folio_mark_accessed() will not be called for 'fbatch.folios[0]' since
> 'iocb->ki_pos' is always equal to 'ra->prev_pos'.

I don't follow this, but maybe I'm just being slow.
Yu Kuai June 2, 2022, 10:25 p.m. UTC | #3
My apologize for the weird email format, I'm using my
cellphone to reply emails, and I'm not able to use my
PC now...
Yu Kuai June 6, 2022, 1:10 a.m. UTC | #4
On 2022/06/03 2:30, Matthew Wilcox wrote:
> On Thu, Jun 02, 2022 at 04:21:29PM +0800, Yu Kuai wrote:
>> In filemap_read(), 'ra->prev_pos' is set to 'iocb->ki_pos + copied',
>> while it should be 'iocb->ki_ops'.
> 
> Can you walk me through your reasoning which leads you to believe that
> it should be ki_pos instead of ki_pos + copied?  As I understand it,
> prev_pos is the end of the previous read, not the beginning of the
> previous read.

Hi, Matthew

The main reason is the following judgement in flemap_read():

if (iocb->ki_pos >> PAGE_SHIFT !=	-> current page
     ra->prev_pos >> PAGE_SHIFT)		-> previous page
         folio_mark_accessed(fbatch.folios[0]);

Which means if current page is the same as previous page, don't mark
page accessed. However, prev_pos is set to 'ki_pos + copied' during last
read, which will cause 'prev_pos >> PAGE_SHIFT' to be current page
instead of previous page.

I was thinking that if prev_pos is set to the begining of the previous
read, 'prev_pos >> PAGE_SHIFT' will be previous page as expected. Set to
the end of previous read is ok, however, I think the caculation of
previous page should be '(prev_pos - 1) >> PAGE_SHIFT' instead.

> 
> For consequence,
>> folio_mark_accessed() will not be called for 'fbatch.folios[0]' since
>> 'iocb->ki_pos' is always equal to 'ra->prev_pos'.
> 
> I don't follow this, but maybe I'm just being slow.

I mssing a condition here:

Under small io sequential read, folio_mark_accessed() will not be called
for 'fbatch.folios[0]' since 'iocb->ki_pos' is always equal to
'ra->prev_pos'.

Thanks,
Kuai
> 
> .
>
Yu Kuai June 6, 2022, 1:11 a.m. UTC | #5
On 2022/06/03 2:22, Andrew Morton wrote:
> On Thu, 2 Jun 2022 16:21:29 +0800 Yu Kuai <yukuai3@huawei.com> wrote:
> 
>> In filemap_read(), 'ra->prev_pos' is set to 'iocb->ki_pos + copied',
>> while it should be 'iocb->ki_ops'. For consequence,
>> folio_mark_accessed() will not be called for 'fbatch.folios[0]' since
>> 'iocb->ki_pos' is always equal to 'ra->prev_pos'.
>>
>> ...
>>
>> --- a/mm/filemap.c
>> +++ b/mm/filemap.c
>> @@ -2728,10 +2728,11 @@ ssize_t filemap_read(struct kiocb *iocb, struct iov_iter *iter,
>>   				flush_dcache_folio(folio);
>>   
>>   			copied = copy_folio_to_iter(folio, offset, bytes, iter);
>> -
>> -			already_read += copied;
>> -			iocb->ki_pos += copied;
>> -			ra->prev_pos = iocb->ki_pos;
>> +			if (copied) {
>> +				ra->prev_pos = iocb->ki_pos;
>> +				already_read += copied;
>> +				iocb->ki_pos += copied;
>> +			}
>>   
>>   			if (copied < bytes) {
>>   				error = -EFAULT;
> 
> It seems tidier, but does it matter?  If copied==0 we're going to break
> out and return -EFAULT anyway?
Hi,

Please notice that I set 'prev_ops' to 'ki_pos' first here, instead of
'ki_pos + copied'.

Thanks,
Kuai
> .
>
Yu Kuai June 9, 2022, 12:59 a.m. UTC | #6
friendly ping ...

在 2022/06/02 16:21, Yu Kuai 写道:
> In filemap_read(), 'ra->prev_pos' is set to 'iocb->ki_pos + copied',
> while it should be 'iocb->ki_ops'. For consequence,
> folio_mark_accessed() will not be called for 'fbatch.folios[0]' since
> 'iocb->ki_pos' is always equal to 'ra->prev_pos'.
> 
> Fixes: 06c0444290ce ("mm/filemap.c: generic_file_buffered_read() now uses find_get_pages_contig")
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> ---
>   mm/filemap.c | 9 +++++----
>   1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 9daeaab36081..0b776e504d35 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -2728,10 +2728,11 @@ ssize_t filemap_read(struct kiocb *iocb, struct iov_iter *iter,
>   				flush_dcache_folio(folio);
>   
>   			copied = copy_folio_to_iter(folio, offset, bytes, iter);
> -
> -			already_read += copied;
> -			iocb->ki_pos += copied;
> -			ra->prev_pos = iocb->ki_pos;
> +			if (copied) {
> +				ra->prev_pos = iocb->ki_pos;
> +				already_read += copied;
> +				iocb->ki_pos += copied;
> +			}
>   
>   			if (copied < bytes) {
>   				error = -EFAULT;
>
Matthew Wilcox June 10, 2022, 2:34 p.m. UTC | #7
On Mon, Jun 06, 2022 at 09:10:03AM +0800, Yu Kuai wrote:
> On 2022/06/03 2:30, Matthew Wilcox wrote:
> > On Thu, Jun 02, 2022 at 04:21:29PM +0800, Yu Kuai wrote:
> > > In filemap_read(), 'ra->prev_pos' is set to 'iocb->ki_pos + copied',
> > > while it should be 'iocb->ki_ops'.
> > 
> > Can you walk me through your reasoning which leads you to believe that
> > it should be ki_pos instead of ki_pos + copied?  As I understand it,
> > prev_pos is the end of the previous read, not the beginning of the
> > previous read.
> 
> Hi, Matthew
> 
> The main reason is the following judgement in flemap_read():
> 
> if (iocb->ki_pos >> PAGE_SHIFT !=	-> current page
>     ra->prev_pos >> PAGE_SHIFT)		-> previous page
>         folio_mark_accessed(fbatch.folios[0]);
> 
> Which means if current page is the same as previous page, don't mark
> page accessed. However, prev_pos is set to 'ki_pos + copied' during last
> read, which will cause 'prev_pos >> PAGE_SHIFT' to be current page
> instead of previous page.
> 
> I was thinking that if prev_pos is set to the begining of the previous
> read, 'prev_pos >> PAGE_SHIFT' will be previous page as expected. Set to
> the end of previous read is ok, however, I think the caculation of
> previous page should be '(prev_pos - 1) >> PAGE_SHIFT' instead.

OK, I think Kent broke this in 723ef24b9b37 ("mm/filemap/c: break
generic_file_buffered_read up into multiple functions").  Before:

-       prev_index = ra->prev_pos >> PAGE_SHIFT;
-       prev_offset = ra->prev_pos & (PAGE_SIZE-1);
...
-               if (prev_index != index || offset != prev_offset)
-                       mark_page_accessed(page);

After:
+       if (iocb->ki_pos >> PAGE_SHIFT != ra->prev_pos >> PAGE_SHIFT)
+               mark_page_accessed(page);

So surely this should have been:

+       if (iocb->ki_pos != ra->prev_pos)
+               mark_page_accessed(page);

Kent, do you recall why you changed it the way you did?
Matthew Wilcox June 10, 2022, 2:36 p.m. UTC | #8
On Fri, Jun 10, 2022 at 03:34:11PM +0100, Matthew Wilcox wrote:
> On Mon, Jun 06, 2022 at 09:10:03AM +0800, Yu Kuai wrote:
> > On 2022/06/03 2:30, Matthew Wilcox wrote:
> > > On Thu, Jun 02, 2022 at 04:21:29PM +0800, Yu Kuai wrote:
> > > > In filemap_read(), 'ra->prev_pos' is set to 'iocb->ki_pos + copied',
> > > > while it should be 'iocb->ki_ops'.
> > > 
> > > Can you walk me through your reasoning which leads you to believe that
> > > it should be ki_pos instead of ki_pos + copied?  As I understand it,
> > > prev_pos is the end of the previous read, not the beginning of the
> > > previous read.
> > 
> > Hi, Matthew
> > 
> > The main reason is the following judgement in flemap_read():
> > 
> > if (iocb->ki_pos >> PAGE_SHIFT !=	-> current page
> >     ra->prev_pos >> PAGE_SHIFT)		-> previous page
> >         folio_mark_accessed(fbatch.folios[0]);
> > 
> > Which means if current page is the same as previous page, don't mark
> > page accessed. However, prev_pos is set to 'ki_pos + copied' during last
> > read, which will cause 'prev_pos >> PAGE_SHIFT' to be current page
> > instead of previous page.
> > 
> > I was thinking that if prev_pos is set to the begining of the previous
> > read, 'prev_pos >> PAGE_SHIFT' will be previous page as expected. Set to
> > the end of previous read is ok, however, I think the caculation of
> > previous page should be '(prev_pos - 1) >> PAGE_SHIFT' instead.
> 
> OK, I think Kent broke this in 723ef24b9b37 ("mm/filemap/c: break
> generic_file_buffered_read up into multiple functions").  Before:
> 
> -       prev_index = ra->prev_pos >> PAGE_SHIFT;
> -       prev_offset = ra->prev_pos & (PAGE_SIZE-1);
> ...
> -               if (prev_index != index || offset != prev_offset)
> -                       mark_page_accessed(page);
> 
> After:
> +       if (iocb->ki_pos >> PAGE_SHIFT != ra->prev_pos >> PAGE_SHIFT)
> +               mark_page_accessed(page);
> 
> So surely this should have been:
> 
> +       if (iocb->ki_pos != ra->prev_pos)
> +               mark_page_accessed(page);
> 
> Kent, do you recall why you changed it the way you did?

Oh, and if this is the right diagnosis, then this is the fix for the
current tree:

+++ b/mm/filemap.c
@@ -2673,8 +2673,7 @@ ssize_t filemap_read(struct kiocb *iocb, struct iov_iter *iter,
                 * When a sequential read accesses a page several times, only
                 * mark it as accessed the first time.
                 */
-               if (iocb->ki_pos >> PAGE_SHIFT !=
-                   ra->prev_pos >> PAGE_SHIFT)
+               if (iocb->ki_pos != ra->prev_pos)
                        folio_mark_accessed(fbatch.folios[0]);

                for (i = 0; i < folio_batch_count(&fbatch); i++) {
Kent Overstreet June 10, 2022, 5:23 p.m. UTC | #9
On 6/10/22 10:36, Matthew Wilcox wrote:
> On Fri, Jun 10, 2022 at 03:34:11PM +0100, Matthew Wilcox wrote:
>> On Mon, Jun 06, 2022 at 09:10:03AM +0800, Yu Kuai wrote:
>>> On 2022/06/03 2:30, Matthew Wilcox wrote:
>>>> On Thu, Jun 02, 2022 at 04:21:29PM +0800, Yu Kuai wrote:
>>>>> In filemap_read(), 'ra->prev_pos' is set to 'iocb->ki_pos + copied',
>>>>> while it should be 'iocb->ki_ops'.
>>>>
>>>> Can you walk me through your reasoning which leads you to believe that
>>>> it should be ki_pos instead of ki_pos + copied?  As I understand it,
>>>> prev_pos is the end of the previous read, not the beginning of the
>>>> previous read.
>>>
>>> Hi, Matthew
>>>
>>> The main reason is the following judgement in flemap_read():
>>>
>>> if (iocb->ki_pos >> PAGE_SHIFT !=	-> current page
>>>      ra->prev_pos >> PAGE_SHIFT)		-> previous page
>>>          folio_mark_accessed(fbatch.folios[0]);
>>>
>>> Which means if current page is the same as previous page, don't mark
>>> page accessed. However, prev_pos is set to 'ki_pos + copied' during last
>>> read, which will cause 'prev_pos >> PAGE_SHIFT' to be current page
>>> instead of previous page.
>>>
>>> I was thinking that if prev_pos is set to the begining of the previous
>>> read, 'prev_pos >> PAGE_SHIFT' will be previous page as expected. Set to
>>> the end of previous read is ok, however, I think the caculation of
>>> previous page should be '(prev_pos - 1) >> PAGE_SHIFT' instead.
>>
>> OK, I think Kent broke this in 723ef24b9b37 ("mm/filemap/c: break
>> generic_file_buffered_read up into multiple functions").  Before:
>>
>> -       prev_index = ra->prev_pos >> PAGE_SHIFT;
>> -       prev_offset = ra->prev_pos & (PAGE_SIZE-1);
>> ...
>> -               if (prev_index != index || offset != prev_offset)
>> -                       mark_page_accessed(page);
>>
>> After:
>> +       if (iocb->ki_pos >> PAGE_SHIFT != ra->prev_pos >> PAGE_SHIFT)
>> +               mark_page_accessed(page);
>>
>> So surely this should have been:
>>
>> +       if (iocb->ki_pos != ra->prev_pos)
>> +               mark_page_accessed(page);
>>
>> Kent, do you recall why you changed it the way you did?

So the idea was that if we're reading from a different _page_ that we 
read from previously, we should be marking it accessed. But there's an 
off by one error, it should have been

if (iocb->ki_pos >> PAGE_SHIFT != (ra->prev_pos - 1) >> PAGE_SHIFT)
	folio_mark_accessed(fbatch.folios[0])

It looks like this is what Yukai was arriving at too when he was saying 
ki_pos + copied - 1, this is just a cleaner way of writing it :)
Kent Overstreet June 10, 2022, 5:47 p.m. UTC | #10
On 6/10/22 10:36, Matthew Wilcox wrote:
> On Fri, Jun 10, 2022 at 03:34:11PM +0100, Matthew Wilcox wrote:
>> On Mon, Jun 06, 2022 at 09:10:03AM +0800, Yu Kuai wrote:
>>> On 2022/06/03 2:30, Matthew Wilcox wrote:
>>>> On Thu, Jun 02, 2022 at 04:21:29PM +0800, Yu Kuai wrote:
>>>>> In filemap_read(), 'ra->prev_pos' is set to 'iocb->ki_pos + copied',
>>>>> while it should be 'iocb->ki_ops'.
>>>>
>>>> Can you walk me through your reasoning which leads you to believe that
>>>> it should be ki_pos instead of ki_pos + copied?  As I understand it,
>>>> prev_pos is the end of the previous read, not the beginning of the
>>>> previous read.
>>>
>>> Hi, Matthew
>>>
>>> The main reason is the following judgement in flemap_read():
>>>
>>> if (iocb->ki_pos >> PAGE_SHIFT !=	-> current page
>>>      ra->prev_pos >> PAGE_SHIFT)		-> previous page
>>>          folio_mark_accessed(fbatch.folios[0]);
>>>
>>> Which means if current page is the same as previous page, don't mark
>>> page accessed. However, prev_pos is set to 'ki_pos + copied' during last
>>> read, which will cause 'prev_pos >> PAGE_SHIFT' to be current page
>>> instead of previous page.
>>>
>>> I was thinking that if prev_pos is set to the begining of the previous
>>> read, 'prev_pos >> PAGE_SHIFT' will be previous page as expected. Set to
>>> the end of previous read is ok, however, I think the caculation of
>>> previous page should be '(prev_pos - 1) >> PAGE_SHIFT' instead.
>>
>> OK, I think Kent broke this in 723ef24b9b37 ("mm/filemap/c: break
>> generic_file_buffered_read up into multiple functions").  Before:
>>
>> -       prev_index = ra->prev_pos >> PAGE_SHIFT;
>> -       prev_offset = ra->prev_pos & (PAGE_SIZE-1);
>> ...
>> -               if (prev_index != index || offset != prev_offset)
>> -                       mark_page_accessed(page);
>>
>> After:
>> +       if (iocb->ki_pos >> PAGE_SHIFT != ra->prev_pos >> PAGE_SHIFT)
>> +               mark_page_accessed(page);
>>
>> So surely this should have been:
>>
>> +       if (iocb->ki_pos != ra->prev_pos)
>> +               mark_page_accessed(page);
>>
>> Kent, do you recall why you changed it the way you did?
> 
> Oh, and if this is the right diagnosis, then this is the fix for the
> current tree:
> 
> +++ b/mm/filemap.c
> @@ -2673,8 +2673,7 @@ ssize_t filemap_read(struct kiocb *iocb, struct iov_iter *iter,
>                   * When a sequential read accesses a page several times, only
>                   * mark it as accessed the first time.
>                   */
> -               if (iocb->ki_pos >> PAGE_SHIFT !=
> -                   ra->prev_pos >> PAGE_SHIFT)
> +               if (iocb->ki_pos != ra->prev_pos)
>                          folio_mark_accessed(fbatch.folios[0]);
> 
>                  for (i = 0; i < folio_batch_count(&fbatch); i++) {
> 
> 

I think this is the fix we want - I think Yu basically had the right 
idea and had the off by one fix, this should be clearer though:

Yu, can you confirm the fix?

-- >8 --
Subject: [PATCH] filemap: Fix off by one error when marking folios accessed

In filemap_read() we mark pages accessed as we read them - but we don't
want to do so redundantly, if the previous read already did so.

But there was an off by one error: we want to check if the current page
was the same as the last page we read from, but the last page we read
from was (ra->prev_pos - 1) >> PAGE_SHIFT.

Reported-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

diff --git a/mm/filemap.c b/mm/filemap.c
index 9daeaab360..8d5c8043cb 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2704,7 +2704,7 @@ ssize_t filemap_read(struct kiocb *iocb, struct 
iov_iter *iter,
                  * mark it as accessed the first time.
                  */
                 if (iocb->ki_pos >> PAGE_SHIFT !=
-                   ra->prev_pos >> PAGE_SHIFT)
+                   (ra->prev_pos - 1) >> PAGE_SHIFT)
                         folio_mark_accessed(fbatch.folios[0]);

                 for (i = 0; i < folio_batch_count(&fbatch); i++) {
Matthew Wilcox June 10, 2022, 6:34 p.m. UTC | #11
On Fri, Jun 10, 2022 at 01:47:02PM -0400, Kent Overstreet wrote:
> I think this is the fix we want - I think Yu basically had the right idea
> and had the off by one fix, this should be clearer though:
> 
> Yu, can you confirm the fix?
> 
> -- >8 --
> Subject: [PATCH] filemap: Fix off by one error when marking folios accessed
> 
> In filemap_read() we mark pages accessed as we read them - but we don't
> want to do so redundantly, if the previous read already did so.
> 
> But there was an off by one error: we want to check if the current page
> was the same as the last page we read from, but the last page we read
> from was (ra->prev_pos - 1) >> PAGE_SHIFT.
> 
> Reported-by: Yu Kuai <yukuai3@huawei.com>
> Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
> 
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 9daeaab360..8d5c8043cb 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -2704,7 +2704,7 @@ ssize_t filemap_read(struct kiocb *iocb, struct
> iov_iter *iter,
>                  * mark it as accessed the first time.
>                  */
>                 if (iocb->ki_pos >> PAGE_SHIFT !=
> -                   ra->prev_pos >> PAGE_SHIFT)
> +                   (ra->prev_pos - 1) >> PAGE_SHIFT)
>                         folio_mark_accessed(fbatch.folios[0]);
> 
>                 for (i = 0; i < folio_batch_count(&fbatch); i++) {
> 

This is going to mark the folio as accessed multiple times if it's
a multi-page folio.  How about this one?


diff --git a/mm/filemap.c b/mm/filemap.c
index 5f227b5420d7..a30587f2e598 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2599,6 +2599,13 @@ static int filemap_get_pages(struct kiocb *iocb, struct iov_iter *iter,
 	return err;
 }
 
+static inline bool pos_same_folio(loff_t pos1, loff_t pos2, struct folio *folio)
+{
+	unsigned int shift = folio_shift(folio);
+
+	return (pos1 >> shift == pos2 >> shift);
+}
+
 /**
  * filemap_read - Read data from the page cache.
  * @iocb: The iocb to read.
@@ -2670,11 +2677,11 @@ ssize_t filemap_read(struct kiocb *iocb, struct iov_iter *iter,
 		writably_mapped = mapping_writably_mapped(mapping);
 
 		/*
-		 * When a sequential read accesses a page several times, only
+		 * When a read accesses the same folio several times, only
 		 * mark it as accessed the first time.
 		 */
-		if (iocb->ki_pos >> PAGE_SHIFT !=
-		    ra->prev_pos >> PAGE_SHIFT)
+		if (!pos_same_folio(iocb->ki_pos, ra->prev_pos - 1,
+							fbatch.folios[0]))
 			folio_mark_accessed(fbatch.folios[0]);
 
 		for (i = 0; i < folio_batch_count(&fbatch); i++) {
Kent Overstreet June 10, 2022, 6:48 p.m. UTC | #12
On 6/10/22 14:34, Matthew Wilcox wrote:
> On Fri, Jun 10, 2022 at 01:47:02PM -0400, Kent Overstreet wrote:
>> I think this is the fix we want - I think Yu basically had the right idea
>> and had the off by one fix, this should be clearer though:
>>
>> Yu, can you confirm the fix?
>>
>> -- >8 --
>> Subject: [PATCH] filemap: Fix off by one error when marking folios accessed
>>
>> In filemap_read() we mark pages accessed as we read them - but we don't
>> want to do so redundantly, if the previous read already did so.
>>
>> But there was an off by one error: we want to check if the current page
>> was the same as the last page we read from, but the last page we read
>> from was (ra->prev_pos - 1) >> PAGE_SHIFT.
>>
>> Reported-by: Yu Kuai <yukuai3@huawei.com>
>> Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
>>
>> diff --git a/mm/filemap.c b/mm/filemap.c
>> index 9daeaab360..8d5c8043cb 100644
>> --- a/mm/filemap.c
>> +++ b/mm/filemap.c
>> @@ -2704,7 +2704,7 @@ ssize_t filemap_read(struct kiocb *iocb, struct
>> iov_iter *iter,
>>                   * mark it as accessed the first time.
>>                   */
>>                  if (iocb->ki_pos >> PAGE_SHIFT !=
>> -                   ra->prev_pos >> PAGE_SHIFT)
>> +                   (ra->prev_pos - 1) >> PAGE_SHIFT)
>>                          folio_mark_accessed(fbatch.folios[0]);
>>
>>                  for (i = 0; i < folio_batch_count(&fbatch); i++) {
>>
> 
> This is going to mark the folio as accessed multiple times if it's
> a multi-page folio.  How about this one?

I like that one - you can add my Reviewed-by

> 
> 
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 5f227b5420d7..a30587f2e598 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -2599,6 +2599,13 @@ static int filemap_get_pages(struct kiocb *iocb, struct iov_iter *iter,
>   	return err;
>   }
>   
> +static inline bool pos_same_folio(loff_t pos1, loff_t pos2, struct folio *folio)
> +{
> +	unsigned int shift = folio_shift(folio);
> +
> +	return (pos1 >> shift == pos2 >> shift);
> +}
> +
>   /**
>    * filemap_read - Read data from the page cache.
>    * @iocb: The iocb to read.
> @@ -2670,11 +2677,11 @@ ssize_t filemap_read(struct kiocb *iocb, struct iov_iter *iter,
>   		writably_mapped = mapping_writably_mapped(mapping);
>   
>   		/*
> -		 * When a sequential read accesses a page several times, only
> +		 * When a read accesses the same folio several times, only
>   		 * mark it as accessed the first time.
>   		 */
> -		if (iocb->ki_pos >> PAGE_SHIFT !=
> -		    ra->prev_pos >> PAGE_SHIFT)
> +		if (!pos_same_folio(iocb->ki_pos, ra->prev_pos - 1,
> +							fbatch.folios[0]))
>   			folio_mark_accessed(fbatch.folios[0]);
>   
>   		for (i = 0; i < folio_batch_count(&fbatch); i++) {
Yu Kuai June 11, 2022, 8:23 a.m. UTC | #13
在 2022/06/11 2:34, Matthew Wilcox 写道:
> On Fri, Jun 10, 2022 at 01:47:02PM -0400, Kent Overstreet wrote:
>> I think this is the fix we want - I think Yu basically had the right idea
>> and had the off by one fix, this should be clearer though:
>>
>> Yu, can you confirm the fix?
>>
>> -- >8 --
>> Subject: [PATCH] filemap: Fix off by one error when marking folios accessed
>>
>> In filemap_read() we mark pages accessed as we read them - but we don't
>> want to do so redundantly, if the previous read already did so.
>>
>> But there was an off by one error: we want to check if the current page
>> was the same as the last page we read from, but the last page we read
>> from was (ra->prev_pos - 1) >> PAGE_SHIFT.
>>
>> Reported-by: Yu Kuai <yukuai3@huawei.com>
>> Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
>>
>> diff --git a/mm/filemap.c b/mm/filemap.c
>> index 9daeaab360..8d5c8043cb 100644
>> --- a/mm/filemap.c
>> +++ b/mm/filemap.c
>> @@ -2704,7 +2704,7 @@ ssize_t filemap_read(struct kiocb *iocb, struct
>> iov_iter *iter,
>>                   * mark it as accessed the first time.
>>                   */
>>                  if (iocb->ki_pos >> PAGE_SHIFT !=
>> -                   ra->prev_pos >> PAGE_SHIFT)
>> +                   (ra->prev_pos - 1) >> PAGE_SHIFT)
>>                          folio_mark_accessed(fbatch.folios[0]);
>>
>>                  for (i = 0; i < folio_batch_count(&fbatch); i++) {
>>
> 
> This is going to mark the folio as accessed multiple times if it's
> a multi-page folio.  How about this one?
> 
Hi, Matthew

Thanks for the patch, it looks good to me.

BTW, I still think the fix should be commit 06c0444290ce ("mm/filemap.c:
generic_file_buffered_read() now uses find_get_pages_contig").
> 
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 5f227b5420d7..a30587f2e598 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -2599,6 +2599,13 @@ static int filemap_get_pages(struct kiocb *iocb, struct iov_iter *iter,
>   	return err;
>   }
>   
> +static inline bool pos_same_folio(loff_t pos1, loff_t pos2, struct folio *folio)
> +{
> +	unsigned int shift = folio_shift(folio);
> +
> +	return (pos1 >> shift == pos2 >> shift);
> +}
> +
>   /**
>    * filemap_read - Read data from the page cache.
>    * @iocb: The iocb to read.
> @@ -2670,11 +2677,11 @@ ssize_t filemap_read(struct kiocb *iocb, struct iov_iter *iter,
>   		writably_mapped = mapping_writably_mapped(mapping);
>   
>   		/*
> -		 * When a sequential read accesses a page several times, only
> +		 * When a read accesses the same folio several times, only
>   		 * mark it as accessed the first time.
>   		 */
> -		if (iocb->ki_pos >> PAGE_SHIFT !=
> -		    ra->prev_pos >> PAGE_SHIFT)
> +		if (!pos_same_folio(iocb->ki_pos, ra->prev_pos - 1,
> +							fbatch.folios[0]))
>   			folio_mark_accessed(fbatch.folios[0]);
>   
>   		for (i = 0; i < folio_batch_count(&fbatch); i++) {
> 
> .
>
Matthew Wilcox June 11, 2022, 5:42 p.m. UTC | #14
On Sat, Jun 11, 2022 at 04:23:42PM +0800, Yu Kuai wrote:
> > This is going to mark the folio as accessed multiple times if it's
> > a multi-page folio.  How about this one?
> > 
> Hi, Matthew
> 
> Thanks for the patch, it looks good to me.

Did you test it?  This is clearly a little subtle ;-)

> BTW, I still think the fix should be commit 06c0444290ce ("mm/filemap.c:
> generic_file_buffered_read() now uses find_get_pages_contig").

Hmm, yes.  That code also has problems, but they're more subtle and
probably don't amount to much.

-       iocb->ki_pos += copied;
-
-       /*
-        * When a sequential read accesses a page several times,
-        * only mark it as accessed the first time.
-        */
-       if (iocb->ki_pos >> PAGE_SHIFT != ra->prev_pos >> PAGE_SHIFT)
-               mark_page_accessed(page);
-
-       ra->prev_pos = iocb->ki_pos;

This will mark the page accessed when we _exit_ a page.  So reading
512-bytes at a time from offset 0, we'll mark page 0 as accessed on the
first read (because the prev_pos is initialised to -1).  Then on the
eighth read, we'll mark page 0 as accessed again (because ki_pos will
now be 4096 and prev_pos is 3584).  We'll then read chunks of page 1
without marking it as accessed, until we're about to step into page 2.

Marking page 0 accessed twice is bad; it'll set the referenced bit the
first time, and then the second time, it'll activate it.  So it'll be
thought to be part of the workingset when it's really just been part of
a streaming read.

And the last page we read will never be marked accessed unless it
happens to finish at the end of a page.

Before Kent started his refactoring, I think it worked:

-       pgoff_t prev_index;
-       unsigned int prev_offset;
...
-       prev_index = ra->prev_pos >> PAGE_SHIFT;
-       prev_offset = ra->prev_pos & (PAGE_SIZE-1);
...
-               if (prev_index != index || offset != prev_offset)
-                       mark_page_accessed(page);
-               prev_index = index;
-               prev_offset = offset;
...
-       ra->prev_pos = prev_index;
-       ra->prev_pos <<= PAGE_SHIFT;
-       ra->prev_pos |= prev_offset;

At least, I don't detect any bugs in this.
Yu Kuai June 13, 2022, 1:31 a.m. UTC | #15
在 2022/06/12 1:42, Matthew Wilcox 写道:
> On Sat, Jun 11, 2022 at 04:23:42PM +0800, Yu Kuai wrote:
>>> This is going to mark the folio as accessed multiple times if it's
>>> a multi-page folio.  How about this one?
>>>
>> Hi, Matthew
>>
>> Thanks for the patch, it looks good to me.
> 
> Did you test it?  This is clearly a little subtle ;-)

Yes, I confirmed that with this patch, small sequential read will mark
page accessed. However, multi-page folio is not tested yet.

> 
>> BTW, I still think the fix should be commit 06c0444290ce ("mm/filemap.c:
>> generic_file_buffered_read() now uses find_get_pages_contig").
> 
> Hmm, yes.  That code also has problems, but they're more subtle and
> probably don't amount to much.
> 
> -       iocb->ki_pos += copied;
> -
> -       /*
> -        * When a sequential read accesses a page several times,
> -        * only mark it as accessed the first time.
> -        */
> -       if (iocb->ki_pos >> PAGE_SHIFT != ra->prev_pos >> PAGE_SHIFT)
> -               mark_page_accessed(page);
> -
> -       ra->prev_pos = iocb->ki_pos;
> 
> This will mark the page accessed when we _exit_ a page.  So reading
> 512-bytes at a time from offset 0, we'll mark page 0 as accessed on the
> first read (because the prev_pos is initialised to -1).  Then on the
> eighth read, we'll mark page 0 as accessed again (because ki_pos will
> now be 4096 and prev_pos is 3584).  We'll then read chunks of page 1
> without marking it as accessed, until we're about to step into page 2.

You are right, I didn't think of that situation.
> 
> Marking page 0 accessed twice is bad; it'll set the referenced bit the
> first time, and then the second time, it'll activate it.  So it'll be
> thought to be part of the workingset when it's really just been part of
> a streaming read.
> 
> And the last page we read will never be marked accessed unless it
> happens to finish at the end of a page.
> 
> Before Kent started his refactoring, I think it worked:
> 
> -       pgoff_t prev_index;
> -       unsigned int prev_offset;
> ...
> -       prev_index = ra->prev_pos >> PAGE_SHIFT;
> -       prev_offset = ra->prev_pos & (PAGE_SIZE-1);
> ...
> -               if (prev_index != index || offset != prev_offset)
> -                       mark_page_accessed(page);
> -               prev_index = index;
> -               prev_offset = offset;
> ...
> -       ra->prev_pos = prev_index;
> -       ra->prev_pos <<= PAGE_SHIFT;
> -       ra->prev_pos |= prev_offset;
> 
> At least, I don't detect any bugs in this.

Sure, thanks for your explanation.
kernel test robot June 15, 2022, 8:36 a.m. UTC | #16
Greeting,

FYI, we noticed a -8.1% regression of phoronix-test-suite.fio.SequentialRead.LinuxAIO.Yes.Yes.4KB.DefaultTestDirectory.mb_s due to commit:


commit: 8b157c14b505f861cf8da783ff89f679a0e50abe ("[PATCH -next] mm/filemap: fix that first page is not mark accessed in filemap_read()")
url: https://github.com/intel-lab-lkp/linux/commits/Yu-Kuai/mm-filemap-fix-that-first-page-is-not-mark-accessed-in-filemap_read/20220602-161035
base: https://git.kernel.org/cgit/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/linux-fsdevel/20220602082129.2805890-1-yukuai3@huawei.com

in testcase: phoronix-test-suite
on test machine: 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 512G memory
with following parameters:

	test: fio-1.14.1
	option_a: Sequential Read
	option_b: Linux AIO
	option_c: Yes
	option_d: Yes
	option_e: 4KB
	option_f: Default Test Directory
	cpufreq_governor: performance
	ucode: 0x500320a

test-description: The Phoronix Test Suite is the most comprehensive testing and benchmarking platform available that provides an extensible framework for which new tests can be easily added.
test-url: http://www.phoronix-test-suite.com/



If you fix the issue, kindly add following tag
Reported-by: kernel test robot <oliver.sang@intel.com>


Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

        git clone https://github.com/intel/lkp-tests.git
        cd lkp-tests
        sudo bin/lkp install job.yaml           # job file is attached in this email
        bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
        sudo bin/lkp run generated-yaml-file

        # if come across any failure that blocks the test,
        # please remove ~/.lkp and /lkp dir to run from a clean state.

=========================================================================================
compiler/cpufreq_governor/kconfig/option_a/option_b/option_c/option_d/option_e/option_f/rootfs/tbox_group/test/testcase/ucode:
  gcc-11/performance/x86_64-rhel-8.3/Sequential Read/Linux AIO/Yes/Yes/4KB/Default Test Directory/debian-x86_64-phoronix/lkp-csl-2sp7/fio-1.14.1/phoronix-test-suite/0x500320a

commit: 
  2408f14000 ("Merge branch 'mm-nonmm-unstable' into mm-everything")
  8b157c14b5 ("mm/filemap: fix that first page is not mark accessed in filemap_read()")

2408f140000f9597 8b157c14b505f861cf8da783ff8 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
    481388            -8.1%     442333        phoronix-test-suite.fio.SequentialRead.LinuxAIO.Yes.Yes.4KB.DefaultTestDirectory.iops
      1880            -8.1%       1727        phoronix-test-suite.fio.SequentialRead.LinuxAIO.Yes.Yes.4KB.DefaultTestDirectory.mb_s
 2.894e+08            -8.1%  2.659e+08        phoronix-test-suite.time.file_system_inputs
      0.11 ± 22%      -0.0        0.08        mpstat.cpu.all.soft%
    292.39 ± 35%     -35.3%     189.30 ±  8%  sched_debug.cpu.clock_task.stddev
    933030           +47.4%    1374932 ±  2%  numa-meminfo.node0.Active
     92985 ± 16%    +478.0%     537464 ±  6%  numa-meminfo.node0.Active(file)
     23246 ± 16%    +475.4%     133769 ±  6%  numa-vmstat.node0.nr_active_file
     23246 ± 16%    +475.4%     133769 ±  6%  numa-vmstat.node0.nr_zone_active_file
   1181131            -8.1%    1085364        vmstat.io.bi
     20529            -7.4%      19019        vmstat.system.cs
    954480           +45.1%    1384840 ±  3%  meminfo.Active
    112134          +386.0%     544959 ±  7%  meminfo.Active(file)
   2756213           -13.9%    2371792        meminfo.Inactive
   1492877           -25.8%    1108430        meminfo.Inactive(file)
     84.17 ± 10%     -11.7%      74.33        turbostat.Avg_MHz
      4.72 ± 18%      -0.9        3.84        turbostat.Busy%
    854421 ±133%     -82.0%     154039 ± 20%  turbostat.C1
      0.49 ±155%      -0.4        0.06 ± 11%  turbostat.C1%
     28033          +386.2%     136307 ±  7%  proc-vmstat.nr_active_file
    373247           -25.8%     277108        proc-vmstat.nr_inactive_file
     28033          +386.2%     136308 ±  7%  proc-vmstat.nr_zone_active_file
    373247           -25.8%     277108        proc-vmstat.nr_zone_inactive_file
  40703167 ±  2%      -8.5%   37255189        proc-vmstat.numa_hit
  40122593            -7.5%   37096628        proc-vmstat.numa_local
    316253        +10501.8%   33528470        proc-vmstat.pgactivate
  40072448            -7.3%   37140540        proc-vmstat.pgalloc_normal
  39689252            -7.5%   36696525        proc-vmstat.pgfree
 1.447e+08            -8.1%   1.33e+08        proc-vmstat.pgpgin
     22.95 ± 52%     -53.5%      10.67        perf-stat.i.MPKI
 1.088e+09            -3.3%  1.052e+09        perf-stat.i.branch-instructions
  14531811 ± 29%     -30.9%   10047658        perf-stat.i.branch-misses
  31350962            -9.2%   28459348        perf-stat.i.cache-misses
  86567058 ± 24%     -29.3%   61243543        perf-stat.i.cache-references
     21004            -7.6%      19398        perf-stat.i.context-switches
 7.243e+09 ± 11%     -13.5%  6.262e+09        perf-stat.i.cpu-cycles
      0.14 ± 95%      -0.1        0.01 ± 10%  perf-stat.i.dTLB-load-miss-rate%
   1307140 ± 15%     +17.6%    1537276        perf-stat.i.iTLB-loads
 5.234e+09            -2.9%  5.084e+09        perf-stat.i.instructions
      2655 ±  5%     -10.9%       2366 ±  3%  perf-stat.i.instructions-per-iTLB-miss
     75383 ± 11%     -13.5%      65208        perf-stat.i.metric.GHz
   6029414            -6.2%    5655914        perf-stat.i.node-loads
     20.94 ± 15%      +3.7       24.66 ±  3%  perf-stat.i.node-store-miss-rate%
     82166 ± 23%     +29.0%     106019 ±  2%  perf-stat.i.node-store-misses
   6382540            -9.0%    5805257        perf-stat.i.node-stores
     16.54 ± 24%     -27.2%      12.04        perf-stat.overall.MPKI
      2862 ±  5%     -11.1%       2544 ±  3%  perf-stat.overall.instructions-per-iTLB-miss
      5.63 ± 15%      +1.0        6.67        perf-stat.overall.node-load-miss-rate%
      1.27 ± 23%      +0.5        1.79        perf-stat.overall.node-store-miss-rate%
 1.078e+09            -3.3%  1.043e+09        perf-stat.ps.branch-instructions
  14418791 ± 29%     -30.9%    9965662        perf-stat.ps.branch-misses
  31056696            -9.2%   28199667        perf-stat.ps.cache-misses
  85785810 ± 24%     -29.3%   60689278        perf-stat.ps.cache-references
     20807            -7.6%      19221        perf-stat.ps.context-switches
 7.181e+09 ± 11%     -13.5%  6.209e+09        perf-stat.ps.cpu-cycles
   1296058 ± 15%     +17.6%    1524338        perf-stat.ps.iTLB-loads
 5.189e+09            -2.9%   5.04e+09        perf-stat.ps.instructions
   5972497            -6.2%    5604175        perf-stat.ps.node-loads
     81503 ± 23%     +29.0%     105130 ±  2%  perf-stat.ps.node-store-misses
   6322173            -9.0%    5752078        perf-stat.ps.node-stores
 6.205e+11            -2.6%  6.041e+11        perf-stat.total.instructions
      7.61 ± 14%      -1.6        6.00 ± 13%  perf-profile.calltrace.cycles-pp.filemap_get_pages.filemap_read.aio_read.io_submit_one.__x64_sys_io_submit
      4.09 ± 14%      -0.8        3.27 ± 11%  perf-profile.calltrace.cycles-pp.invalidate_mapping_pagevec.generic_fadvise.ksys_fadvise64_64.__x64_sys_fadvise64.do_syscall_64
      4.10 ± 14%      -0.8        3.28 ± 11%  perf-profile.calltrace.cycles-pp.__x64_sys_fadvise64.do_syscall_64.entry_SYSCALL_64_after_hwframe
      4.10 ± 14%      -0.8        3.28 ± 11%  perf-profile.calltrace.cycles-pp.ksys_fadvise64_64.__x64_sys_fadvise64.do_syscall_64.entry_SYSCALL_64_after_hwframe
      4.10 ± 14%      -0.8        3.28 ± 11%  perf-profile.calltrace.cycles-pp.generic_fadvise.ksys_fadvise64_64.__x64_sys_fadvise64.do_syscall_64.entry_SYSCALL_64_after_hwframe
      2.46 ± 15%      -0.5        2.00 ± 11%  perf-profile.calltrace.cycles-pp.__x64_sys_io_getevents.do_syscall_64.entry_SYSCALL_64_after_hwframe
      2.35 ± 16%      -0.4        1.90 ± 11%  perf-profile.calltrace.cycles-pp.do_io_getevents.__x64_sys_io_getevents.do_syscall_64.entry_SYSCALL_64_after_hwframe
      1.68 ± 16%      -0.4        1.30 ± 15%  perf-profile.calltrace.cycles-pp.read_pages.page_cache_ra_unbounded.filemap_get_pages.filemap_read.aio_read
      1.75 ± 14%      -0.4        1.38 ± 10%  perf-profile.calltrace.cycles-pp.release_pages.__pagevec_release.invalidate_mapping_pagevec.generic_fadvise.ksys_fadvise64_64
      1.77 ± 14%      -0.4        1.40 ± 10%  perf-profile.calltrace.cycles-pp.__pagevec_release.invalidate_mapping_pagevec.generic_fadvise.ksys_fadvise64_64.__x64_sys_fadvise64
      0.89 ± 18%      -0.3        0.59 ± 46%  perf-profile.calltrace.cycles-pp.filemap_get_read_batch.filemap_get_pages.filemap_read.aio_read.io_submit_one
      0.85 ± 18%      -0.3        0.57 ± 46%  perf-profile.calltrace.cycles-pp.ext4_mpage_readpages.read_pages.page_cache_ra_unbounded.filemap_get_pages.filemap_read
      1.49 ± 14%      -0.3        1.22 ± 12%  perf-profile.calltrace.cycles-pp.folio_alloc.page_cache_ra_unbounded.filemap_get_pages.filemap_read.aio_read
      1.32 ± 13%      -0.2        1.08 ± 12%  perf-profile.calltrace.cycles-pp.__alloc_pages.folio_alloc.page_cache_ra_unbounded.filemap_get_pages.filemap_read
      0.98 ± 13%      -0.2        0.76 ± 11%  perf-profile.calltrace.cycles-pp.free_unref_page_list.release_pages.__pagevec_release.invalidate_mapping_pagevec.generic_fadvise
      1.05 ± 12%      -0.2        0.84 ± 14%  perf-profile.calltrace.cycles-pp.get_page_from_freelist.__alloc_pages.folio_alloc.page_cache_ra_unbounded.filemap_get_pages
      0.90 ± 10%      -0.2        0.72 ± 14%  perf-profile.calltrace.cycles-pp.rmqueue.get_page_from_freelist.__alloc_pages.folio_alloc.page_cache_ra_unbounded
      0.75 ± 15%      -0.2        0.59 ± 10%  perf-profile.calltrace.cycles-pp.free_unref_page_commit.free_unref_page_list.release_pages.__pagevec_release.invalidate_mapping_pagevec
      1.53 ± 17%      +0.4        1.95 ±  9%  perf-profile.calltrace.cycles-pp.schedule.worker_thread.kthread.ret_from_fork
      1.53 ± 17%      +0.4        1.95 ±  9%  perf-profile.calltrace.cycles-pp.__schedule.schedule.worker_thread.kthread.ret_from_fork
      0.00            +1.2        1.17 ± 18%  perf-profile.calltrace.cycles-pp.pagevec_lru_move_fn.folio_mark_accessed.filemap_read.aio_read.io_submit_one
      0.31 ±101%      +2.2        2.47 ± 17%  perf-profile.calltrace.cycles-pp.folio_mark_accessed.filemap_read.aio_read.io_submit_one.__x64_sys_io_submit
      7.61 ± 14%      -1.6        6.00 ± 13%  perf-profile.children.cycles-pp.filemap_get_pages
      4.10 ± 14%      -0.8        3.28 ± 11%  perf-profile.children.cycles-pp.__x64_sys_fadvise64
      4.10 ± 14%      -0.8        3.28 ± 11%  perf-profile.children.cycles-pp.ksys_fadvise64_64
      4.10 ± 14%      -0.8        3.28 ± 11%  perf-profile.children.cycles-pp.generic_fadvise
      4.10 ± 14%      -0.8        3.28 ± 11%  perf-profile.children.cycles-pp.invalidate_mapping_pagevec
      2.47 ± 15%      -0.5        2.00 ± 11%  perf-profile.children.cycles-pp.__x64_sys_io_getevents
      2.36 ± 16%      -0.5        1.90 ± 11%  perf-profile.children.cycles-pp.do_io_getevents
      1.68 ± 16%      -0.4        1.30 ± 15%  perf-profile.children.cycles-pp.read_pages
      1.77 ± 14%      -0.4        1.40 ± 10%  perf-profile.children.cycles-pp.__pagevec_release
      1.49 ± 14%      -0.3        1.22 ± 12%  perf-profile.children.cycles-pp.folio_alloc
      1.40 ± 12%      -0.3        1.14 ± 12%  perf-profile.children.cycles-pp.__alloc_pages
      1.16 ± 15%      -0.3        0.90 ± 13%  perf-profile.children.cycles-pp.lookup_ioctx
      0.90 ± 18%      -0.2        0.67 ± 17%  perf-profile.children.cycles-pp.filemap_get_read_batch
      1.00 ± 12%      -0.2        0.78 ± 12%  perf-profile.children.cycles-pp.free_unref_page_list
      1.08 ± 11%      -0.2        0.86 ± 14%  perf-profile.children.cycles-pp.get_page_from_freelist
      0.85 ± 18%      -0.2        0.65 ± 15%  perf-profile.children.cycles-pp.ext4_mpage_readpages
      0.88 ± 16%      -0.2        0.70 ± 14%  perf-profile.children.cycles-pp.__might_resched
      0.93 ± 10%      -0.2        0.75 ± 14%  perf-profile.children.cycles-pp.rmqueue
      0.78 ± 15%      -0.2        0.61 ± 10%  perf-profile.children.cycles-pp.free_unref_page_commit
      0.61 ± 16%      -0.1        0.48 ± 12%  perf-profile.children.cycles-pp.free_pcppages_bulk
      0.27 ± 15%      -0.1        0.20 ± 11%  perf-profile.children.cycles-pp.hrtimer_next_event_without
      0.16 ± 22%      -0.1        0.11 ± 19%  perf-profile.children.cycles-pp.hrtimer_update_next_event
      0.08 ± 20%      -0.0        0.04 ± 47%  perf-profile.children.cycles-pp.mem_cgroup_charge_statistics
      0.08 ±  9%      -0.0        0.05 ± 47%  perf-profile.children.cycles-pp.tick_program_event
      1.46 ± 13%      +0.3        1.76 ±  8%  perf-profile.children.cycles-pp.load_balance
      0.00            +0.4        0.43 ± 16%  perf-profile.children.cycles-pp.workingset_age_nonresident
      0.00            +0.7        0.65 ± 17%  perf-profile.children.cycles-pp.workingset_activation
      0.00            +0.7        0.67 ± 17%  perf-profile.children.cycles-pp.__folio_activate
      0.00            +1.2        1.18 ± 18%  perf-profile.children.cycles-pp.pagevec_lru_move_fn
      0.57 ± 17%      +1.9        2.51 ± 17%  perf-profile.children.cycles-pp.folio_mark_accessed
      4.33 ± 17%      -0.9        3.45 ± 13%  perf-profile.self.cycles-pp.copy_user_enhanced_fast_string
      1.36 ± 12%      -0.5        0.84 ± 36%  perf-profile.self.cycles-pp.menu_select
      0.64 ± 17%      -0.2        0.45 ± 18%  perf-profile.self.cycles-pp.filemap_get_read_batch
      0.86 ± 16%      -0.2        0.67 ± 13%  perf-profile.self.cycles-pp.__might_resched
      0.46 ± 19%      -0.1        0.32 ± 18%  perf-profile.self.cycles-pp.__get_user_4
      0.34 ± 10%      -0.1        0.24 ±  3%  perf-profile.self.cycles-pp.copy_page_to_iter
      0.14 ± 17%      -0.0        0.09 ± 32%  perf-profile.self.cycles-pp.aio_prep_rw
      0.11 ± 14%      -0.0        0.07 ± 23%  perf-profile.self.cycles-pp._raw_spin_unlock_irqrestore
      0.08 ± 12%      -0.0        0.04 ± 73%  perf-profile.self.cycles-pp.tick_program_event
      0.14 ±  9%      -0.0        0.10 ± 11%  perf-profile.self.cycles-pp.atime_needs_update
      0.00            +0.2        0.22 ± 26%  perf-profile.self.cycles-pp.workingset_activation
      0.00            +0.3        0.29 ± 19%  perf-profile.self.cycles-pp.pagevec_lru_move_fn
      0.00            +0.4        0.35 ± 16%  perf-profile.self.cycles-pp.__folio_activate
      0.00            +0.4        0.43 ± 16%  perf-profile.self.cycles-pp.workingset_age_nonresident




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
diff mbox series

Patch

diff --git a/mm/filemap.c b/mm/filemap.c
index 9daeaab36081..0b776e504d35 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2728,10 +2728,11 @@  ssize_t filemap_read(struct kiocb *iocb, struct iov_iter *iter,
 				flush_dcache_folio(folio);
 
 			copied = copy_folio_to_iter(folio, offset, bytes, iter);
-
-			already_read += copied;
-			iocb->ki_pos += copied;
-			ra->prev_pos = iocb->ki_pos;
+			if (copied) {
+				ra->prev_pos = iocb->ki_pos;
+				already_read += copied;
+				iocb->ki_pos += copied;
+			}
 
 			if (copied < bytes) {
 				error = -EFAULT;