diff mbox series

fuse: annotate potential data-race in num_background

Message ID 20240509125716.1268016-1-leitao@debian.org (mailing list archive)
State New, archived
Headers show
Series fuse: annotate potential data-race in num_background | expand

Commit Message

Breno Leitao May 9, 2024, 12:57 p.m. UTC
A data race occurs when two concurrent data paths potentially access
fuse_conn->num_background simultaneously.

Specifically, fuse_request_end() accesses and modifies ->num_background
while holding the bg_lock, whereas fuse_readahead() reads
->num_background without acquiring any lock beforehand. This potential
data race is flagged by KCSAN:

	BUG: KCSAN: data-race in fuse_readahead [fuse] / fuse_request_end [fuse]

	read-write to 0xffff8883a6666598 of 4 bytes by task 113809 on cpu 39:
	fuse_request_end (fs/fuse/dev.c:318) fuse
	fuse_dev_do_write (fs/fuse/dev.c:?) fuse
	fuse_dev_write (fs/fuse/dev.c:?) fuse
	...

	read to 0xffff8883a6666598 of 4 bytes by task 113787 on cpu 8:
	fuse_readahead (fs/fuse/file.c:1005) fuse
	read_pages (mm/readahead.c:166)
	page_cache_ra_unbounded (mm/readahead.c:?)
	...

	value changed: 0x00000001 -> 0x00000000

Annotated the reader with READ_ONCE() and the writer with WRITE_ONCE()
to avoid such complaint from KCSAN.

Suggested-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Breno Leitao <leitao@debian.org>
---
 fs/fuse/dev.c  | 6 ++++--
 fs/fuse/file.c | 2 +-
 2 files changed, 5 insertions(+), 3 deletions(-)

Comments

Miklos Szeredi May 10, 2024, 9:21 a.m. UTC | #1
On Thu, 9 May 2024 at 14:57, Breno Leitao <leitao@debian.org> wrote:

> Annotated the reader with READ_ONCE() and the writer with WRITE_ONCE()
> to avoid such complaint from KCSAN.

I'm not sure the write side part is really needed, since the lock is
properly protecting against concurrent readers/writers within the
locked region.

Does KCSAN still complain if you just add the READ_ONCE() to fuse_readahead()?

Thanks,
Miklos
Breno Leitao May 13, 2024, 12:41 p.m. UTC | #2
Hello Miklos,

On Fri, May 10, 2024 at 11:21:19AM +0200, Miklos Szeredi wrote:
> On Thu, 9 May 2024 at 14:57, Breno Leitao <leitao@debian.org> wrote:
> 
> > Annotated the reader with READ_ONCE() and the writer with WRITE_ONCE()
> > to avoid such complaint from KCSAN.
> 
> I'm not sure the write side part is really needed, since the lock is
> properly protecting against concurrent readers/writers within the
> locked region.

I understand that num_background is read from an unlocked region
(fuse_readahead()).

> Does KCSAN still complain if you just add the READ_ONCE() to fuse_readahead()?

I haven't checked, but, looking at the documentation it says that both part
needs to be marked. Here is an example very similar to ours here, from
tools/memory-model/Documentation/access-marking.txt

	Lock-Protected Writes With Lockless Reads
	-----------------------------------------

	For another example, suppose a shared variable "foo" is updated only
	while holding a spinlock, but is read locklessly.  The code might look
	as follows:

		int foo;
		DEFINE_SPINLOCK(foo_lock);

		void update_foo(int newval)
		{
			spin_lock(&foo_lock);
			WRITE_ONCE(foo, newval);
			ASSERT_EXCLUSIVE_WRITER(foo);
			do_something(newval);
			spin_unlock(&foo_wlock);
		}

		int read_foo(void)
		{
			do_something_else();
			return READ_ONCE(foo);
		}

	Because foo is read locklessly, all accesses are marked.


From my understanding, we need a WRITE_ONCE() inside the lock, because
the bg_lock lock in fuse_request_end() is invisible for fuse_readahead(),
and fuse_readahead() might read num_backgroud that was writen
non-atomically/corrupted (if there is no WRITE_ONCE()).

That said, if the reader (fuse_readahead()) can handle possible
corrupted data, we can mark is with data_race() annotation. Then I
understand we don't need to mark the write with WRITE_ONCE().

Here is what access-marking.txt says about this case:

	Here are some situations where data_race() should be used instead of
	READ_ONCE() and WRITE_ONCE():

	1.      Data-racy loads from shared variables whose values are used only
		for diagnostic purposes.

	2.      Data-racy reads whose values are checked against marked reload.

	3.      Reads whose values feed into error-tolerant heuristics.

	4.      Writes setting values that feed into error-tolerant heuristics.


Anyway, I am more than happy to test with only a READ_ONLY() in the
reader side, if that the approach you prefer.

Thanks!
Miklos Szeredi May 17, 2024, 3:23 p.m. UTC | #3
On Mon, 13 May 2024 at 14:41, Breno Leitao <leitao@debian.org> wrote:

> That said, if the reader (fuse_readahead()) can handle possible
> corrupted data, we can mark is with data_race() annotation. Then I
> understand we don't need to mark the write with WRITE_ONCE().

Adding Willy, since the readahead code in fuse is fairly special.

I don't think it actually matters if  "fc->num_background >=
fc->congestion_threshold" returns false positive or false negative,
but I don't have a full understanding of how readahead works.

Willy, can you please look at fuse_readahead() to confirm that
breaking out of the loop is okay if (rac->ra->async_size >=
readahead_count(rac)) no mater what?

Thanks,
Miklos
diff mbox series

Patch

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 3ec8bb5e68ff..8e63dba49eff 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -282,6 +282,7 @@  void fuse_request_end(struct fuse_req *req)
 	struct fuse_mount *fm = req->fm;
 	struct fuse_conn *fc = fm->fc;
 	struct fuse_iqueue *fiq = &fc->iq;
+	unsigned int num_background;
 
 	if (test_and_set_bit(FR_FINISHED, &req->flags))
 		goto put_request;
@@ -301,7 +302,8 @@  void fuse_request_end(struct fuse_req *req)
 	if (test_bit(FR_BACKGROUND, &req->flags)) {
 		spin_lock(&fc->bg_lock);
 		clear_bit(FR_BACKGROUND, &req->flags);
-		if (fc->num_background == fc->max_background) {
+		num_background = READ_ONCE(fc->num_background);
+		if (num_background == fc->max_background) {
 			fc->blocked = 0;
 			wake_up(&fc->blocked_waitq);
 		} else if (!fc->blocked) {
@@ -315,7 +317,7 @@  void fuse_request_end(struct fuse_req *req)
 				wake_up(&fc->blocked_waitq);
 		}
 
-		fc->num_background--;
+		WRITE_ONCE(fc->num_background, num_background - 1);
 		fc->active_background--;
 		flush_bg_queue(fc);
 		spin_unlock(&fc->bg_lock);
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index b57ce4157640..07331889bbf3 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1002,7 +1002,7 @@  static void fuse_readahead(struct readahead_control *rac)
 		struct fuse_io_args *ia;
 		struct fuse_args_pages *ap;
 
-		if (fc->num_background >= fc->congestion_threshold &&
+		if (READ_ONCE(fc->num_background) >= fc->congestion_threshold &&
 		    rac->ra->async_size >= readahead_count(rac))
 			/*
 			 * Congested and only async pages left, so skip the