diff mbox series

[v3] ceph: invalidate pages when doing direct/sync writes

Message ID 20220407143834.7516-1-lhenriques@suse.de (mailing list archive)
State New, archived
Headers show
Series [v3] ceph: invalidate pages when doing direct/sync writes | expand

Commit Message

Luis Henriques April 7, 2022, 2:38 p.m. UTC
When doing a direct/sync write, we need to invalidate the page cache in
the range being written to.  If we don't do this, the cache will include
invalid data as we just did a write that avoided the page cache.

Signed-off-by: Luís Henriques <lhenriques@suse.de>
---
 fs/ceph/file.c | 9 +++++++++
 1 file changed, 9 insertions(+)

Ok, here's a new attempt.  After discussion in this thread and on IRC, I
think this is the right fix.  generic/647 now passes with and without
encryption.  Thanks!

Changes since v2:
- Invalidation needs to be done after a write

Changes since v1:
- Replaced truncate_inode_pages_range() by invalidate_inode_pages2_range
- Call fscache_invalidate with FSCACHE_INVAL_DIO_WRITE if we're doing DIO

Comments

Jeff Layton April 7, 2022, 2:42 p.m. UTC | #1
On Thu, 2022-04-07 at 15:38 +0100, Luís Henriques wrote:
> When doing a direct/sync write, we need to invalidate the page cache in
> the range being written to.  If we don't do this, the cache will include
> invalid data as we just did a write that avoided the page cache.
> 
> Signed-off-by: Luís Henriques <lhenriques@suse.de>
> ---
>  fs/ceph/file.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> Ok, here's a new attempt.  After discussion in this thread and on IRC, I
> think this is the right fix.  generic/647 now passes with and without
> encryption.  Thanks!
> 
> Changes since v2:
> - Invalidation needs to be done after a write
> 
> Changes since v1:
> - Replaced truncate_inode_pages_range() by invalidate_inode_pages2_range
> - Call fscache_invalidate with FSCACHE_INVAL_DIO_WRITE if we're doing DIO
> 
> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> index 5072570c2203..63e67eb60310 100644
> --- a/fs/ceph/file.c
> +++ b/fs/ceph/file.c
> @@ -1938,6 +1938,15 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos,
>  			break;
>  		}
>  		ceph_clear_error_write(ci);
> +		ret = invalidate_inode_pages2_range(
> +				inode->i_mapping,
> +				pos >> PAGE_SHIFT,
> +				(pos + len - 1) >> PAGE_SHIFT);
> +		if (ret < 0) {
> +			dout("invalidate_inode_pages2_range returned %d\n",
> +			     ret);
> +			ret = 0;
> +		}
>  		pos += len;
>  		written += len;
>  		dout("sync_write written %d\n", written);

Looks good. I suspect we can also remove the
invalidate_indode_pages2_range call earlier in this function too. I may
roll that into this patch.

I'll give this an xfstests run with fscrypt enabled and see how it does.

Thanks!
Xiubo Li April 7, 2022, 2:48 p.m. UTC | #2
On 4/7/22 10:38 PM, Luís Henriques wrote:
> When doing a direct/sync write, we need to invalidate the page cache in
> the range being written to.  If we don't do this, the cache will include
> invalid data as we just did a write that avoided the page cache.
>
> Signed-off-by: Luís Henriques <lhenriques@suse.de>
> ---
>   fs/ceph/file.c | 9 +++++++++
>   1 file changed, 9 insertions(+)
>
> Ok, here's a new attempt.  After discussion in this thread and on IRC, I
> think this is the right fix.  generic/647 now passes with and without
> encryption.  Thanks!
>
> Changes since v2:
> - Invalidation needs to be done after a write
>
> Changes since v1:
> - Replaced truncate_inode_pages_range() by invalidate_inode_pages2_range
> - Call fscache_invalidate with FSCACHE_INVAL_DIO_WRITE if we're doing DIO
>
> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> index 5072570c2203..63e67eb60310 100644
> --- a/fs/ceph/file.c
> +++ b/fs/ceph/file.c
> @@ -1938,6 +1938,15 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos,
>   			break;
>   		}
>   		ceph_clear_error_write(ci);
> +		ret = invalidate_inode_pages2_range(
> +				inode->i_mapping,
> +				pos >> PAGE_SHIFT,
> +				(pos + len - 1) >> PAGE_SHIFT);
> +		if (ret < 0) {
> +			dout("invalidate_inode_pages2_range returned %d\n",
> +			     ret);
> +			ret = 0;
> +		}
>   		pos += len;
>   		written += len;
>   		dout("sync_write written %d\n", written);
>
LGTM.

Maybe it worth adding a comment to explain why we need this and where 
the mapping come from ?

Reviewed-by: Xiubo Li <xiubli@redhat.com>
Luis Henriques April 7, 2022, 3:03 p.m. UTC | #3
Jeff Layton <jlayton@kernel.org> writes:

> On Thu, 2022-04-07 at 15:38 +0100, Luís Henriques wrote:
>> When doing a direct/sync write, we need to invalidate the page cache in
>> the range being written to.  If we don't do this, the cache will include
>> invalid data as we just did a write that avoided the page cache.
>> 
>> Signed-off-by: Luís Henriques <lhenriques@suse.de>
>> ---
>>  fs/ceph/file.c | 9 +++++++++
>>  1 file changed, 9 insertions(+)
>> 
>> Ok, here's a new attempt.  After discussion in this thread and on IRC, I
>> think this is the right fix.  generic/647 now passes with and without
>> encryption.  Thanks!
>> 
>> Changes since v2:
>> - Invalidation needs to be done after a write
>> 
>> Changes since v1:
>> - Replaced truncate_inode_pages_range() by invalidate_inode_pages2_range
>> - Call fscache_invalidate with FSCACHE_INVAL_DIO_WRITE if we're doing DIO
>> 
>> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
>> index 5072570c2203..63e67eb60310 100644
>> --- a/fs/ceph/file.c
>> +++ b/fs/ceph/file.c
>> @@ -1938,6 +1938,15 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos,
>>  			break;
>>  		}
>>  		ceph_clear_error_write(ci);
>> +		ret = invalidate_inode_pages2_range(
>> +				inode->i_mapping,
>> +				pos >> PAGE_SHIFT,
>> +				(pos + len - 1) >> PAGE_SHIFT);
>> +		if (ret < 0) {
>> +			dout("invalidate_inode_pages2_range returned %d\n",
>> +			     ret);
>> +			ret = 0;
>> +		}
>>  		pos += len;
>>  		written += len;
>>  		dout("sync_write written %d\n", written);
>
> Looks good. I suspect we can also remove the
> invalidate_indode_pages2_range call earlier in this function too. I may
> roll that into this patch.

Right, that occurred to me as well but I wasn't really sure that would be
safe.

> I'll give this an xfstests run with fscrypt enabled and see how it does.

I'll do the same here, I just run a few tests on a vstart cluster.  But
I'll definitely give it some more testing.

Cheers,
Luis Henriques April 7, 2022, 3:05 p.m. UTC | #4
Xiubo Li <xiubli@redhat.com> writes:

> On 4/7/22 10:38 PM, Luís Henriques wrote:
>> When doing a direct/sync write, we need to invalidate the page cache in
>> the range being written to.  If we don't do this, the cache will include
>> invalid data as we just did a write that avoided the page cache.
>>
>> Signed-off-by: Luís Henriques <lhenriques@suse.de>
>> ---
>>   fs/ceph/file.c | 9 +++++++++
>>   1 file changed, 9 insertions(+)
>>
>> Ok, here's a new attempt.  After discussion in this thread and on IRC, I
>> think this is the right fix.  generic/647 now passes with and without
>> encryption.  Thanks!
>>
>> Changes since v2:
>> - Invalidation needs to be done after a write
>>
>> Changes since v1:
>> - Replaced truncate_inode_pages_range() by invalidate_inode_pages2_range
>> - Call fscache_invalidate with FSCACHE_INVAL_DIO_WRITE if we're doing DIO
>>
>> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
>> index 5072570c2203..63e67eb60310 100644
>> --- a/fs/ceph/file.c
>> +++ b/fs/ceph/file.c
>> @@ -1938,6 +1938,15 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos,
>>   			break;
>>   		}
>>   		ceph_clear_error_write(ci);
>> +		ret = invalidate_inode_pages2_range(
>> +				inode->i_mapping,
>> +				pos >> PAGE_SHIFT,
>> +				(pos + len - 1) >> PAGE_SHIFT);
>> +		if (ret < 0) {
>> +			dout("invalidate_inode_pages2_range returned %d\n",
>> +			     ret);
>> +			ret = 0;
>> +		}
>>   		pos += len;
>>   		written += len;
>>   		dout("sync_write written %d\n", written);
>>
> LGTM.
>
> Maybe it worth adding a comment to explain why we need this and where the
> mapping come from ?
>
> Reviewed-by: Xiubo Li <xiubli@redhat.com>
>

Sure, I'll send out v4 with an extra comment.  Thanks.

Cheers,
diff mbox series

Patch

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 5072570c2203..63e67eb60310 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -1938,6 +1938,15 @@  ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos,
 			break;
 		}
 		ceph_clear_error_write(ci);
+		ret = invalidate_inode_pages2_range(
+				inode->i_mapping,
+				pos >> PAGE_SHIFT,
+				(pos + len - 1) >> PAGE_SHIFT);
+		if (ret < 0) {
+			dout("invalidate_inode_pages2_range returned %d\n",
+			     ret);
+			ret = 0;
+		}
 		pos += len;
 		written += len;
 		dout("sync_write written %d\n", written);