diff mbox series

9p/trans_fd: mark concurrent read and writes to p9_conn->err

Message ID 20250308-p9_conn_err_benign_data_race-v1-1-729e57d5832b@iencinas.com (mailing list archive)
State New
Headers show
Series 9p/trans_fd: mark concurrent read and writes to p9_conn->err | expand

Commit Message

Ignacio Encinas March 8, 2025, 5:47 p.m. UTC
Writes for the error value of a connection are spinlock-protected inside
p9_conn_cancel, but lockless reads are present elsewhere to avoid
performing unnecessary work after an error has been met.

Mark the write and lockless reads to make KCSAN happy. Mark the write as
exclusive following the recommendation in "Lock-Protected Writes with
Lockless Reads" in tools/memory-model/Documentation/access-marking.txt
while we are at it.

Reported-by: syzbot+d69a7cc8c683c2cb7506@syzkaller.appspotmail.com
Reported-by: syzbot+483d6c9b9231ea7e1851@syzkaller.appspotmail.com
Signed-off-by: Ignacio Encinas <ignacio@iencinas.com>
---
Hello! I noticed these syzbot reports that seem to repeat periodically
and figured I should send a patch. 

The read-paths look very similar to the one changed here [1]. Perhaps it
would make sense to make them the same?

[1] https://lore.kernel.org/all/ZTZtHdqifXlWG8nN@codewreck.org/
---
 net/9p/trans_fd.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)


---
base-commit: 2a520073e74fbb956b5564818fc5529dcc7e9f0e
change-id: 20250308-p9_conn_err_benign_data_race-2758fe8bbed0

Best regards,

Comments

Dominique Martinet March 8, 2025, 10:08 p.m. UTC | #1
Ignacio Encinas wrote on Sat, Mar 08, 2025 at 06:47:38PM +0100:
> Writes for the error value of a connection are spinlock-protected inside
> p9_conn_cancel, but lockless reads are present elsewhere to avoid
> performing unnecessary work after an error has been met.
> 
> Mark the write and lockless reads to make KCSAN happy. Mark the write as
> exclusive following the recommendation in "Lock-Protected Writes with
> Lockless Reads" in tools/memory-model/Documentation/access-marking.txt
> while we are at it.

Thank for looking into it, I wasn't aware this could be enough to please
the KCSAN gods.

Unfortunately neither have a repro so will be hard to test but I guess
it can't hurt, so will pick this up after a bit.

> Reported-by: syzbot+d69a7cc8c683c2cb7506@syzkaller.appspotmail.com
> Reported-by: syzbot+483d6c9b9231ea7e1851@syzkaller.appspotmail.com
> Signed-off-by: Ignacio Encinas <ignacio@iencinas.com>
> ---
> Hello! I noticed these syzbot reports that seem to repeat periodically
> and figured I should send a patch. 
> 
> The read-paths look very similar to the one changed here [1]. Perhaps it
> would make sense to make them the same?

I've just gone over read/write work and I think overall the logic
doesn't look too bad as the checks for m->err are just optimizations
that could be skipped entierly.

For example, even if read work misses the check and recv some data, the
p9_tag_lookup is what actually protects the "req", so either cancel
didn't cancel yet and it'll get two status updates but it's valid
memory and the refcounting is also correct, or the cancel was finished
and read won't find the request.
(I guess one could argue that two status updates could be a problem in
the p9_client_rpc path, but the data actually has been received and the
mount is busted anyway so I don't think any bad bug would happen..
Famous last words, yes)

Write likewise will just find itself with nothing to do as the list had
been emptied (and p9_fd_request does check m->err under lock so can't
add new items)

So, sure, they could recheck but I don't see the point; if syzbot is
happy with this patch I think that's good enough.


> [1] https://lore.kernel.org/all/ZTZtHdqifXlWG8nN@codewreck.org/
> ---
>  net/9p/trans_fd.c | 11 ++++++-----
>  1 file changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
> index 196060dc6138af10e99ad04a76ee36a11f770c65..5458e6530084cabeb01d13e9b9a4b1b8f338e494 100644
> --- a/net/9p/trans_fd.c
> +++ b/net/9p/trans_fd.c
> @@ -194,9 +194,10 @@ static void p9_conn_cancel(struct p9_conn *m, int err)
>        if (m->err) {

This is under spin lock and I don't see the compiler reordering this
read and write, but should this also get READ_ONCE?

>                spin_unlock(&m->req_lock);
>  		return;
>  	}
>  
> -	m->err = err;
> +	WRITE_ONCE(m->err, err);
> +	ASSERT_EXCLUSIVE_WRITER(m->err);

Thanks,
Ignacio Encinas March 9, 2025, 7:35 a.m. UTC | #2
On 8/3/25 23:08, Dominique Martinet wrote:
> Thank for looking into it, I wasn't aware this could be enough to please
> the KCSAN gods.

Thank you for reviewing it!

> I've just gone over read/write work and I think overall the logic
> doesn't look too bad as the checks for m->err are just optimizations
> that could be skipped entierly.

That was my impression too. Thanks for confirming! 
As far as I know, this is as non-problematic as it gets. 

> So, sure, they could recheck but I don't see the point; if syzbot is
> happy with this patch I think that's good enough.

I think KCSAN shouldn't complain anymore. However, let me send a v2:

>> [1] https://lore.kernel.org/all/ZTZtHdqifXlWG8nN@codewreck.org/

I last-minute edited this snippet because it looks like it should be
like the rest of them (just a READ_ONCE, no spinlock) 

@@ -673,7 +674,7 @@ static int p9_fd_request(struct p9_client *client, struct p9_req_t *req)
 
 	spin_lock(&m->req_lock);
 
-	if (m->err < 0) {
+	if (READ_ONCE(m->err) < 0) {
 		spin_unlock(&m->req_lock);
 		return m->err;
 	}

but as I left it, it doesn't make any sense. It's either a racy read +
READ_ONCE to make KCSAN happy or a protected read which shouldn't be a
problem. I'll just drop this hunk and leave it as it was.

>> ---
>>  net/9p/trans_fd.c | 11 ++++++-----
>>  1 file changed, 6 insertions(+), 5 deletions(-)
>>
>> diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
>> index 196060dc6138af10e99ad04a76ee36a11f770c65..5458e6530084cabeb01d13e9b9a4b1b8f338e494 100644
>> --- a/net/9p/trans_fd.c
>> +++ b/net/9p/trans_fd.c
>> @@ -194,9 +194,10 @@ static void p9_conn_cancel(struct p9_conn *m, int err)
>>        if (m->err) {
> 
> This is under spin lock and I don't see the compiler reordering this
> read and write, but should this also get READ_ONCE?

It wouldn't hurt, but I don't think it would do anything. spin_lock and
spin_unlock both emit compiler barriers so that code can't be moved out
of critical sections (apart from doing actual locking, release-acquire
ordering ...). I guess the only function of a READ_ONCE here would be to
ensure atomicity of the read, but 

  1) There are no concurrent writes when this read is happening due to
  the spinlock being locked

  2) Getting the load teared is almost impossible(?) as it is an aligned
  4 byte read. Even if the load got garbage, we would just return
  without reading the actual value.

I'll wait a couple of days to send the v2 in case there is any more
feedback. 

Thanks again!
diff mbox series

Patch

diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
index 196060dc6138af10e99ad04a76ee36a11f770c65..5458e6530084cabeb01d13e9b9a4b1b8f338e494 100644
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -196,7 +196,8 @@  static void p9_conn_cancel(struct p9_conn *m, int err)
 		return;
 	}
 
-	m->err = err;
+	WRITE_ONCE(m->err, err);
+	ASSERT_EXCLUSIVE_WRITER(m->err);
 
 	list_for_each_entry_safe(req, rtmp, &m->req_list, req_list) {
 		list_move(&req->req_list, &cancel_list);
@@ -283,7 +284,7 @@  static void p9_read_work(struct work_struct *work)
 
 	m = container_of(work, struct p9_conn, rq);
 
-	if (m->err < 0)
+	if (READ_ONCE(m->err) < 0)
 		return;
 
 	p9_debug(P9_DEBUG_TRANS, "start mux %p pos %zd\n", m, m->rc.offset);
@@ -450,7 +451,7 @@  static void p9_write_work(struct work_struct *work)
 
 	m = container_of(work, struct p9_conn, wq);
 
-	if (m->err < 0) {
+	if (READ_ONCE(m->err) < 0) {
 		clear_bit(Wworksched, &m->wsched);
 		return;
 	}
@@ -622,7 +623,7 @@  static void p9_poll_mux(struct p9_conn *m)
 	__poll_t n;
 	int err = -ECONNRESET;
 
-	if (m->err < 0)
+	if (READ_ONCE(m->err) < 0)
 		return;
 
 	n = p9_fd_poll(m->client, NULL, &err);
@@ -673,7 +674,7 @@  static int p9_fd_request(struct p9_client *client, struct p9_req_t *req)
 
 	spin_lock(&m->req_lock);
 
-	if (m->err < 0) {
+	if (READ_ONCE(m->err) < 0) {
 		spin_unlock(&m->req_lock);
 		return m->err;
 	}